Pandas: Best Way To Join Two Dataframes Based On A Common Column
I know this is a basic question. But, please hear me out. I have below dataframes: In [722]: m1 Out[722]: Person_id Evidence_14 Feature_14 0 100 90.0 True
Solution 1:
If columns names matching and need match by Person_id
values use:
m = m1.set_index('Person_id').combine_first(m2.set_index('Person_id')).reset_index()
If index values are same and also Person_id
are same in both DataFrames solution should be simplify by matching with original index values:
m = m1.combine_first(m2)
Solution 2:
As Person_id uniquely define related rows in m1 and m3, you have to use set_index. Look at this :
import pandas as pd
df1 = pd.DataFrame({'id':[11, 22, 33,44],'A': [None, 0, 17, None], 'B': [None, 4, 19,None]})
df2 = pd.DataFrame({'id':[111, 222], 'A': [9999, 9999], 'B': [7777, 7777]})
# df1 = df1.set_index('id')# df2 = df2.set_index('id')
df1.combine_first(df2)
Out[32]:
id A B
0119999.07777.01220.04.023317.019.0344 NaN NaN
if you dont use set_index the first value of A will be changed even if it's id is 11 in df1 and 111 in df2 (different id)
Also note that if you use set_index, a non existing id in m1 will be Added to the result.
Post a Comment for "Pandas: Best Way To Join Two Dataframes Based On A Common Column"