Python Pandas Dataframe Fill NaN With Other Series
I want to fill NaN values in a DataFrame (df) column (var4) based on a control table (fillna_mean) using column mean, and var1 as index.In the dataframe I want them to match on var
Solution 1:
you can use boolean indexing in conjunction with .map() method:
In [178]: fillna.set_index('var1', inplace=True)
In [179]: df.loc[df.var4.isnull(), 'var4'] = df.loc[df.var4.isnull(), 'var1'].map(fillna['mean'])
In [180]: df
Out[180]:
var1 var2 var3 var4
0 a 0 40 1.0
1 a 1 97 2.0
2 a 2 34 1.0
3 b 3 6 3.0
4 b 4 19 2.0
5 c 5 47 6.5
6 c 6 65 1.0
7 c 7 29 34.0
8 c 8 48 6.5
9 d 9 88 10.0
10 d 10 40 12.0
11 d 11 23 12.0
Explanation:
In [184]: df.loc[df.var4.isnull()]
Out[184]:
var1 var2 var3 var4
2 a 2 75 NaN
5 c 5 75 NaN
8 c 8 44 NaN
9 d 9 34 NaN
In [185]: df.loc[df.var4.isnull(), 'var1']
Out[185]:
2 a
5 c
8 c
9 d
Name: var1, dtype: object
In [186]: df.loc[df.var4.isnull(), 'var1'].map(fillna['mean'])
Out[186]:
2 1.0
5 6.5
8 6.5
9 10.0
Name: var1, dtype: float64
UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.
Solution 2:
Get faster results with combine_first
, and you don't bother you filter out nonnull data:
fillna.set_index('var1', inplace=True)
df.var4 = df.var4.combine_first(df.var1.map(fillna['mean']))
Post a Comment for "Python Pandas Dataframe Fill NaN With Other Series"