Conditional Shift: Subtract 'previous Row Value' From 'current Row Value' With Multiple Conditions In Pandas
Solution 1:
You may try something like this:
df['DiffHeartRate']=(df.groupby(['Disease', 'State',
(df.MonthStart.dt.month.ne(df.MonthStart.dt.month.shift()+1)).cumsum()])['HeartRate']
.apply(lambda x: x.diff())).fillna(df.HeartRate)
Disease HeartRate State MonthStart MonthEnd DiffHeartRate
0 Covid 89 Texas 2020-02-28 2020-03-31 89.0
1 Covid 91 Texas 2020-03-31 2020-04-30 2.0
2 Covid 87 Texas 2020-07-31 2020-08-30 87.0
3 Cancer 90 Texas 2020-02-28 2020-03-31 90.0
4 Cancer 88 Florida 2020-03-31 2020-04-30 88.0
5 Covid 89 Florida 2020-02-28 2020-03-31 89.0
6 Covid 87 Florida 2020-03-31 2020-04-30 -2.0
7 Flu 90 Florida 2020-02-28 2020-03-31 90.0
Logic is same as the other answers but different way of representing.
Solution 2:
Try:
import numpy as np
df.MonthStart = pd.to_datetime(df.MonthStart)
df.MonthEnd = pd.to_datetime(df.MonthEnd)
def cal_diff(x):
x['DiffHeartRate'] = np.where(x['MonthEnd'].shift().dt.month.eq(
x['MonthStart'].dt.month), x['HeartRate'].diff(), x['HeartRate'])
return x
df = df.groupby(['Disease', 'State']).apply(cal_diff)
Output
Disease HeartRate State MonthStart MonthEnd DiffHeartRate
0 Covid 89 Texas 2020-02-28 2020-03-31 89
1 Covid 91 Texas 2020-03-31 2020-04-30 2
2 Covid 87 Texas 2020-07-31 2020-08-30 87
3 Cancer 90 Texas 2020-02-28 2020-03-31 90
4 Cancer 88 Florida 2020-03-31 2020-04-30 88
5 Covid 89 Florida 2020-02-28 2020-03-31 89
6 Covid 87 Florida 2020-03-31 2020-04-30 -2
7 Flu 90 Florida 2020-02-28 2020-03-31 90
Solution 3:
You can do it by .mask()
together with .groupby()
and .transform()
as follows:
df['HeartRateDiff'] = (df['HeartRate'].mask(
df['MonthStart'].groupby([df['Disease'], df['State']]).transform('diff').lt(np.timedelta64(2,'M')),
df.groupby(['Disease', 'State'])['HeartRate'].transform('diff')
)
)
Details:
(1) Firstly, we ensure the date columns are of datetime format instead of strings:
You can skip this step if your date columns are already in datetime format.
df['MonthStart'] = pd.to_datetime(df['MonthStart'])
df['MonthEnd'] = pd.to_datetime(df['MonthEnd'])
(2) The HeartRate change (within group) is obtained by:
df.groupby(['Disease', 'State'])['HeartRate'].transform('diff')
We can simply use 'diff'
within .transform()
instead of using pd.Series.diff
to achieve the same result.
(3) Continuity of timeline (next month or not) is checked by the following condition:
df['MonthStart'].groupby([df['Disease'], df['State']]).transform('diff').lt(np.timedelta64(2,'M'))
We check the time difference with previous date (within group) being strictly less than 2 months to ensure it is in the next month. We cannot check <= 1 month since some date difference of 2 consecutive month begins can be 32 days. Note that this checking also works for year break (from December to January) where logics checking only with month figure (from 12 to 1) will give wrong result.
(4) Finally, we get the new column by using .mask()
on the existing column HeartRate
:
.mask()
tests for the condition in its 1st parameter and replaces rows to values in its 2nd parameter when the condition is true. It retains the original values for rows when the condition is not met. Thus, achieving our goal of conditional replacement of values.
Output:
Disease HeartRate State MonthStart MonthEnd HeartRateDiff
0 Covid 89 Texas 2020-02-28 2020-03-31 89
1 Covid 91 Texas 2020-03-31 2020-04-30 2
2 Covid 87 Texas 2020-07-31 2020-08-30 87
3 Cancer 90 Texas 2020-02-28 2020-03-31 90
4 Cancer 88 Florida 2020-03-31 2020-04-30 88
5 Covid 89 Florida 2020-02-28 2020-03-31 89
6 Covid 87 Florida 2020-03-31 2020-04-30 -2
7 Flu 90 Florida 2020-02-28 2020-03-31 90
Solution 4:
I've used a combination of groupby
and np.where
and df.fillna()
to accomplish your tasks.
There may be more efficient methods but I hope this helps.
Input the df
Disease HeartRate State MonthStart MonthEnd
0 Covid 89 Texas 2020-02-28 2020-03-31
1 Covid 91 Texas 2020-03-31 2020-04-30
2 Covid 87 Texas 2020-07-31 2020-08-30
3 Cancer 90 Texas 2020-02-28 2020-03-31
4 Cancer 88 Florida 2020-03-31 2020-04-30
5 Covid 89 Florida 2020-02-28 2020-03-31
6 Covid 87 Florida 2020-03-31 2020-04-30
7 Flu 90 Florida 2020-02-28 2020-03-31
Get HeartRateDiff just like you did
df['DiffHeartRate'] = df.groupby(['Disease', 'State'])['HeartRate'].transform(pd.Series.diff)
For the consecutive months, I would add previous month value as a column
Then simply check whether the months are consecutive or not using np.where
df['MonthStart'] = pd.to_datetime(df['MonthStart'])
df['PrevMonth'] = df['MonthStart'].shift().dt.month
df['DiffHeartRateFinal'] = np.where(df['PrevMonth']==df['MonthStart'].dt.month-1, df['DiffHeartRate'], df['HeartRate'])
Finally, fill all NAN with HeartRate instead
df['DiffHeartRateFinal'] = df['DiffHeartRateFinal'].fillna(df['HeartRate'])
Output
Disease HeartRate State MonthStart MonthEnd DiffHeartRateFinal
Covid 89 Texas 2020-02-28 2020-03-31 89.0
Covid 91 Texas 2020-03-31 2020-04-30 2.0
Covid 87 Texas 2020-07-31 2020-08-30 87.0
Cancer 90 Texas 2020-02-28 2020-03-31 90.0
Cancer 88 Florida 2020-03-31 2020-04-30 88.0
Covid 89 Florida 2020-02-28 2020-03-31 89.0
Covid 87 Florida 2020-03-31 2020-04-30 -2.0
Flu 90 Florida 2020-02-28 2020-03-31 90.0
Post a Comment for "Conditional Shift: Subtract 'previous Row Value' From 'current Row Value' With Multiple Conditions In Pandas"