Skip to content Skip to sidebar Skip to footer

Pandas Select Rows When Column Value Within Range From Another Row Column Value

I'm trying to create subset from dataframe(100k-500k rows) with the following format d = {'time':[1,2,3,5,7,9,9.5,10], 'val':['match','match','match','not','not','match','match','m

Solution 1:

In one line it would look like this:

df.loc[(df['time'].diff()<=1)|(df['time'].diff(-1)>=-1)]

Solution 2:

I got a solution, but i think it is not the best solution

dfasc=df.sort_values(['time'], ascending=1)
dfdesc=df.sort_values(['time'], ascending=0)

print (df[(dfasc['time'].diff()<=1.0) | (dfdesc['time'].diff()>=-1.0)])

   time    val
01.0match12.0match23.0match59.0match69.5match710.0match

Solution 3:

If you want to do it so it is vectorized this will work. You may want to used vectorized operations since your DF is so large. You may also want to put it into a function to save memory since I make a few variables below.

import numpy as np
import pandas as pd
df = pd.DataFrame({'time':[1,2,2.5,3,9,9.5,10,11,12],'val':
['not','match','match','match','match','match','match','not','not']})
'''
df
   time    val
0   1.0    not
1   2.0  match
2   2.5  match
3   3.0  match
4   9.0  match
5   9.5  match
6  10.0  match
7  11.0    not
8  12.0    not
'''
x = df.time.values
tmp = (x[1:] - x[:-1]) < 1
fst = tmp[0]
lst = tmp[-1]
mid = np.any([tmp[1:],tmp[:-1]],axis =0)
ans = np.concatenate([[fst],mid,[lst]])
df  = df[ans]
''' Output
   time    val
1   2.0  match
2   2.5  match
3   3.0  match
4   9.0  match
5   9.5  match
6  10.0  match
'''

Post a Comment for "Pandas Select Rows When Column Value Within Range From Another Row Column Value"