Skip to content Skip to sidebar Skip to footer

How To Compare Two Dataframe (structtype) In Python

Essentially this is to compare two dataframes, I am able to compare their names with: def diff(first, second): second = set(second) return [item for item in first if item n

Solution 1:

OK, so the answer is indeed very straightforward as below for future reader's reference:

def diff(first, second):
    second = set(second)
    return [item for item in first if item not in second]

dl1_fields = list(pDF1.schema.fields)

dl2_fields = list(pDF2.schema.fields)

print("=========================================================")
print("schema comparison result:")
print("=========================================================")
dl1Notdl2 = diff(dl1_fields, dl2_fields)
print(str(len(dl1Notdl2)) + " columns in first df but not in second")
pprint.pprint(dl1Notdl2)
print("=========================================================")
dl2Notdl1 = diff(dl2_fields, dl1_fields)
print(str(len(dl2Notdl1)) + " columns in second df but not in first")
pprint.pprint(dl2Notdl1)

Post a Comment for "How To Compare Two Dataframe (structtype) In Python"