How Can I Add Summary Rows To A Pandas Dataframe Calculated On Multiple Columns By Agg Functions Like Mean, Median, Etc
I have some data with multiple observations for a given Collector, Date, Sample, and Type where the observation values vary by ID. import StringIO import pandas as pd data = '''Co
Solution 1:
You could use aggfunc=[np.mean, np.median]
to compute both the means and the medians. Then you could use margins=True
to also obtain the means and medians for each column and for each row.
result = df.pivot_table(index=["Collector", "Date", "Sample", "Type"],
columns="ID", values="Value", margins=True,
aggfunc=[np.mean, np.median]).stack(level=0)
yields
ID AB C D All
Collector Date Sample Type
Emily 2014-06-20201 HV mean 34.022.0010.005.0017.7500
median 34.022.0010.005.0016.00002014-06-23203 HV mean 33.035.0013.001.0020.5000
median 33.035.0013.001.0023.0000
John 2014-06-22221 HV mean 40.039.0011.002.0023.0000
median 40.039.0011.002.0025.00002014-07-01218 HV mean 35.029.0013.001.0019.5000
median 35.029.0013.001.0021.0000All mean 35.531.2511.752.2520.1875
median 34.532.0012.001.5017.5000
Yes, result
contains more data than you asked for, but
result.loc['All']
has the additional values:
ID AB C D All
Date Sample Type
mean 35.531.2511.752.2520.1875
median 34.532.0012.001.5017.5000
Or, you could further subselect result
to get just the rows you are looking for:
result.index.names = [u'Collector', u'Date', u'Sample', u'Type', u'aggfunc']
mask = result.index.get_level_values('aggfunc') == 'mean'
mask[-1] = True
result = result.loc[mask]
print(result)
yields
ID AB C D All
Collector Date Sample Type aggfunc
Emily 2014-06-20201 HV mean 34.022.0010.005.0017.75002014-06-23203 HV mean 33.035.0013.001.0020.5000
John 2014-06-22221 HV mean 40.039.0011.002.0023.00002014-07-01218 HV mean 35.029.0013.001.0019.5000All mean 35.531.2511.752.2520.1875
median 34.532.0012.001.5017.5000
Solution 2:
This might not be super clean, but you could assign to the new entries with .loc
.
In [131]: table_mean = table.mean()
In [132]: table_median = table.median()
In [134]: table.loc['Mean', :] = table_mean.values
In [135]: table.loc['Median', :] = table_median.values
In [136]: table
Out[136]:
ID A B C D
Collector Date Sample Type
Emily 2014-06-20201 HV 34.022.0010.005.002014-06-23203 HV 33.035.0013.001.00
John 2014-06-22221 HV 40.039.0011.002.002014-07-01218 HV 35.029.0013.001.00
Mean 35.531.2511.752.25
Median 34.532.0012.001.50
Post a Comment for "How Can I Add Summary Rows To A Pandas Dataframe Calculated On Multiple Columns By Agg Functions Like Mean, Median, Etc"