Skip to content Skip to sidebar Skip to footer

How Can I Add Summary Rows To A Pandas Dataframe Calculated On Multiple Columns By Agg Functions Like Mean, Median, Etc

I have some data with multiple observations for a given Collector, Date, Sample, and Type where the observation values vary by ID. import StringIO import pandas as pd data = '''Co

Solution 1:

You could use aggfunc=[np.mean, np.median] to compute both the means and the medians. Then you could use margins=True to also obtain the means and medians for each column and for each row.

result = df.pivot_table(index=["Collector", "Date", "Sample", "Type"], 
    columns="ID", values="Value", margins=True, 
    aggfunc=[np.mean, np.median]).stack(level=0)

yields

ID                                          AB      C     D      All
Collector Date       Sample Type                                          
Emily     2014-06-20201    HV   mean    34.022.0010.005.0017.7500
                                 median  34.022.0010.005.0016.00002014-06-23203    HV   mean    33.035.0013.001.0020.5000
                                 median  33.035.0013.001.0023.0000
John      2014-06-22221    HV   mean    40.039.0011.002.0023.0000
                                 median  40.039.0011.002.0025.00002014-07-01218    HV   mean    35.029.0013.001.0019.5000
                                 median  35.029.0013.001.0021.0000All                              mean    35.531.2511.752.2520.1875
                                 median  34.532.0012.001.5017.5000

Yes, result contains more data than you asked for, but

result.loc['All']

has the additional values:

ID                          AB      C     D      All
Date Sample Type                                          
                 mean    35.531.2511.752.2520.1875
                 median  34.532.0012.001.5017.5000

Or, you could further subselect result to get just the rows you are looking for:

result.index.names = [u'Collector', u'Date', u'Sample', u'Type', u'aggfunc']
mask = result.index.get_level_values('aggfunc') == 'mean'
mask[-1] = True
result = result.loc[mask]
print(result)

yields

ID                                           AB      C     D      All
Collector Date       Sample Type aggfunc                                   
Emily     2014-06-20201    HV   mean     34.022.0010.005.0017.75002014-06-23203    HV   mean     33.035.0013.001.0020.5000
John      2014-06-22221    HV   mean     40.039.0011.002.0023.00002014-07-01218    HV   mean     35.029.0013.001.0019.5000All                              mean     35.531.2511.752.2520.1875
                                 median   34.532.0012.001.5017.5000

Solution 2:

This might not be super clean, but you could assign to the new entries with .loc.

In [131]: table_mean = table.mean()

In [132]: table_median = table.median()

In [134]: table.loc['Mean', :] = table_mean.values

In [135]: table.loc['Median', :] = table_median.values

In [136]: table
Out[136]: 
ID                                   A      B      C     D
Collector Date       Sample Type                          
Emily     2014-06-20201    HV    34.022.0010.005.002014-06-23203    HV    33.035.0013.001.00
John      2014-06-22221    HV    40.039.0011.002.002014-07-01218    HV    35.029.0013.001.00
Mean                              35.531.2511.752.25
Median                            34.532.0012.001.50

Post a Comment for "How Can I Add Summary Rows To A Pandas Dataframe Calculated On Multiple Columns By Agg Functions Like Mean, Median, Etc"