How To Compare Two Dataframe (structtype) In Python
Essentially this is to compare two dataframes, I am able to compare their names with: def diff(first, second): second = set(second) return [item for item in first if item n
Solution 1:
OK, so the answer is indeed very straightforward as below for future reader's reference:
def diff(first, second):
second = set(second)
return [item for item in first if item not in second]
dl1_fields = list(pDF1.schema.fields)
dl2_fields = list(pDF2.schema.fields)
print("=========================================================")
print("schema comparison result:")
print("=========================================================")
dl1Notdl2 = diff(dl1_fields, dl2_fields)
print(str(len(dl1Notdl2)) + " columns in first df but not in second")
pprint.pprint(dl1Notdl2)
print("=========================================================")
dl2Notdl1 = diff(dl2_fields, dl1_fields)
print(str(len(dl2Notdl1)) + " columns in second df but not in first")
pprint.pprint(dl2Notdl1)
Post a Comment for "How To Compare Two Dataframe (structtype) In Python"