Skip to content Skip to sidebar Skip to footer

Pandas - Groupby Multiple Values?

i have a dataframe that contains cell phone minutes usage logged by date of call and duration. It looks like this (30 row sample): id user_id call_date duration 0 1

Solution 1:

Something is not quite right in your setup. First of all, both of your tables are the same, so I am not sure if this is a cut-and-paste error or something else. Here is what I do with your data. Load it up like so, note we explicitly convert call_date to Datetime`

from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(
"""
          id  user_id  call_date  duration
0    1000_93     1000 2018-12-27      8.52
1   1000_145     1000 2018-12-27     13.66
2   1000_247     1000 2018-12-27     14.48
3   1000_309     1000 2018-12-28      5.76
4   1000_380     1000 2018-12-30      4.22
5   1000_388     1000 2018-12-31      2.20
6   1000_510     1000 2018-12-27      5.75
7   1000_521     1000 2018-12-28     14.18
8   1000_530     1000 2018-12-28      5.77
9   1000_544     1000 2018-12-26      4.40
10  1000_693     1000 2018-12-31      4.31
11  1000_705     1000 2018-12-31     12.78
12  1000_735     1000 2018-12-29      1.70
13  1000_778     1000 2018-12-28      3.29
14  1000_826     1000 2018-12-26      9.96
15  1000_842     1000 2018-12-27      5.85
16    1001_0     1001 2018-09-06     10.06
17    1001_1     1001 2018-10-12      1.00
18    1001_2     1001 2018-10-17     15.83
19    1001_4     1001 2018-12-05      0.00
20    1001_5     1001 2018-12-13      6.27
21    1001_6     1001 2018-12-04      7.19
22    1001_8     1001 2018-11-17      2.45
23    1001_9     1001 2018-11-19      2.40
24   1001_11     1001 2018-11-09      1.00
25   1001_13     1001 2018-12-24      0.00
26   1001_19     1001 2018-11-15     30.00
27   1001_20     1001 2018-09-21      5.75
28   1001_23     1001 2018-10-27      0.98
29   1001_26     1001 2018-10-28      5.90
30   1001_29     1001 2018-09-30     14.78
"""), delim_whitespace = True, index_col=0)
df['call_date'] = pd.to_datetime(df['call_date'])

Then using

df.groupby(['user_id','call_date'])['duration'].sum()

does the expected grouping by user and by each date:

user_idcall_date1000     2018-12-26    14.362018-12-27    48.262018-12-28    29.002018-12-29     1.702018-12-30     4.222018-12-31    19.291001     2018-09-06    10.062018-09-21     5.752018-09-30    14.782018-10-12     1.002018-10-17    15.832018-10-27     0.982018-10-28     5.902018-11-09     1.002018-11-15    30.002018-11-17     2.452018-11-19     2.402018-12-04     7.192018-12-05     0.002018-12-13     6.272018-12-24     0.00

If you want to group by month as you seem to suggest you can use the Grouper functionality:

df.groupby(['user_id',pd.Grouper(key='call_date', freq='1M')])['duration'].sum()

which produces

user_idcall_date1000     2018-12-31    116.831001     2018-09-30     30.592018-10-31     23.712018-11-30     35.852018-12-31     13.46

Let me know if you are getting different results from following these steps

Post a Comment for "Pandas - Groupby Multiple Values?"