Pandas - Groupby Multiple Values?
i have a dataframe that contains cell phone minutes usage logged by date of call and duration. It looks like this (30 row sample): id user_id call_date duration 0 1
Solution 1:
Something is not quite right in your setup. First of all, both of your tables are the same, so I am not sure if this is a cut-and-paste error or something else. Here is what I do with your data. Load it up like so, note we explicitly convert call_date
to Datetime`
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(
"""
id user_id call_date duration
0 1000_93 1000 2018-12-27 8.52
1 1000_145 1000 2018-12-27 13.66
2 1000_247 1000 2018-12-27 14.48
3 1000_309 1000 2018-12-28 5.76
4 1000_380 1000 2018-12-30 4.22
5 1000_388 1000 2018-12-31 2.20
6 1000_510 1000 2018-12-27 5.75
7 1000_521 1000 2018-12-28 14.18
8 1000_530 1000 2018-12-28 5.77
9 1000_544 1000 2018-12-26 4.40
10 1000_693 1000 2018-12-31 4.31
11 1000_705 1000 2018-12-31 12.78
12 1000_735 1000 2018-12-29 1.70
13 1000_778 1000 2018-12-28 3.29
14 1000_826 1000 2018-12-26 9.96
15 1000_842 1000 2018-12-27 5.85
16 1001_0 1001 2018-09-06 10.06
17 1001_1 1001 2018-10-12 1.00
18 1001_2 1001 2018-10-17 15.83
19 1001_4 1001 2018-12-05 0.00
20 1001_5 1001 2018-12-13 6.27
21 1001_6 1001 2018-12-04 7.19
22 1001_8 1001 2018-11-17 2.45
23 1001_9 1001 2018-11-19 2.40
24 1001_11 1001 2018-11-09 1.00
25 1001_13 1001 2018-12-24 0.00
26 1001_19 1001 2018-11-15 30.00
27 1001_20 1001 2018-09-21 5.75
28 1001_23 1001 2018-10-27 0.98
29 1001_26 1001 2018-10-28 5.90
30 1001_29 1001 2018-09-30 14.78
"""), delim_whitespace = True, index_col=0)
df['call_date'] = pd.to_datetime(df['call_date'])
Then using
df.groupby(['user_id','call_date'])['duration'].sum()
does the expected grouping by user and by each date:
user_idcall_date1000 2018-12-26 14.362018-12-27 48.262018-12-28 29.002018-12-29 1.702018-12-30 4.222018-12-31 19.291001 2018-09-06 10.062018-09-21 5.752018-09-30 14.782018-10-12 1.002018-10-17 15.832018-10-27 0.982018-10-28 5.902018-11-09 1.002018-11-15 30.002018-11-17 2.452018-11-19 2.402018-12-04 7.192018-12-05 0.002018-12-13 6.272018-12-24 0.00
If you want to group by month as you seem to suggest you can use the Grouper
functionality:
df.groupby(['user_id',pd.Grouper(key='call_date', freq='1M')])['duration'].sum()
which produces
user_idcall_date1000 2018-12-31 116.831001 2018-09-30 30.592018-10-31 23.712018-11-30 35.852018-12-31 13.46
Let me know if you are getting different results from following these steps
Post a Comment for "Pandas - Groupby Multiple Values?"