Skip to content Skip to sidebar Skip to footer

Pandas: Best Way To Join Two Dataframes Based On A Common Column

I know this is a basic question. But, please hear me out. I have below dataframes: In [722]: m1 Out[722]: Person_id Evidence_14 Feature_14 0 100 90.0 True

Solution 1:

If columns names matching and need match by Person_id values use:

m = m1.set_index('Person_id').combine_first(m2.set_index('Person_id')).reset_index()

If index values are same and also Person_id are same in both DataFrames solution should be simplify by matching with original index values:

m = m1.combine_first(m2)

Solution 2:

As Person_id uniquely define related rows in m1 and m3, you have to use set_index. Look at this :

import pandas as pd

df1 = pd.DataFrame({'id':[11, 22, 33,44],'A': [None, 0, 17, None], 'B': [None, 4, 19,None]})
df2 = pd.DataFrame({'id':[111, 222], 'A': [9999, 9999], 'B': [7777, 7777]})

# df1 = df1.set_index('id')# df2 = df2.set_index('id')

df1.combine_first(df2)


Out[32]: 
   id       A       B
0119999.07777.01220.04.023317.019.0344     NaN     NaN

if you dont use set_index the first value of A will be changed even if it's id is 11 in df1 and 111 in df2 (different id)

Also note that if you use set_index, a non existing id in m1 will be Added to the result.

Post a Comment for "Pandas: Best Way To Join Two Dataframes Based On A Common Column"