Skip to content Skip to sidebar Skip to footer

Python Pandas Dataframe Fill NaN With Other Series

I want to fill NaN values in a DataFrame (df) column (var4) based on a control table (fillna_mean) using column mean, and var1 as index.In the dataframe I want them to match on var

Solution 1:

you can use boolean indexing in conjunction with .map() method:

In [178]: fillna.set_index('var1', inplace=True)

In [179]: df.loc[df.var4.isnull(), 'var4'] = df.loc[df.var4.isnull(), 'var1'].map(fillna['mean'])

In [180]: df
Out[180]:
   var1  var2  var3  var4
0     a     0    40   1.0
1     a     1    97   2.0
2     a     2    34   1.0
3     b     3     6   3.0
4     b     4    19   2.0
5     c     5    47   6.5
6     c     6    65   1.0
7     c     7    29  34.0
8     c     8    48   6.5
9     d     9    88  10.0
10    d    10    40  12.0
11    d    11    23  12.0

Explanation:

In [184]: df.loc[df.var4.isnull()]
Out[184]:
  var1  var2  var3  var4
2    a     2    75   NaN
5    c     5    75   NaN
8    c     8    44   NaN
9    d     9    34   NaN

In [185]: df.loc[df.var4.isnull(), 'var1']
Out[185]:
2    a
5    c
8    c
9    d
Name: var1, dtype: object

In [186]: df.loc[df.var4.isnull(), 'var1'].map(fillna['mean'])
Out[186]:
2     1.0
5     6.5
8     6.5
9    10.0
Name: var1, dtype: float64

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.


Solution 2:

Get faster results with combine_first, and you don't bother you filter out nonnull data:

fillna.set_index('var1', inplace=True)

df.var4 = df.var4.combine_first(df.var1.map(fillna['mean']))

Post a Comment for "Python Pandas Dataframe Fill NaN With Other Series"