Pandas Select Rows When Column Value Within Range From Another Row Column Value
I'm trying to create subset from dataframe(100k-500k rows) with the following format d = {'time':[1,2,3,5,7,9,9.5,10], 'val':['match','match','match','not','not','match','match','m
Solution 1:
In one line it would look like this:
df.loc[(df['time'].diff()<=1)|(df['time'].diff(-1)>=-1)]
Solution 2:
I got a solution, but i think it is not the best solution
dfasc=df.sort_values(['time'], ascending=1)
dfdesc=df.sort_values(['time'], ascending=0)
print (df[(dfasc['time'].diff()<=1.0) | (dfdesc['time'].diff()>=-1.0)])
time val
01.0match12.0match23.0match59.0match69.5match710.0matchSolution 3:
If you want to do it so it is vectorized this will work. You may want to used vectorized operations since your DF is so large. You may also want to put it into a function to save memory since I make a few variables below.
import numpy as np
import pandas as pd
df = pd.DataFrame({'time':[1,2,2.5,3,9,9.5,10,11,12],'val':
['not','match','match','match','match','match','match','not','not']})
'''
df
time val
0 1.0 not
1 2.0 match
2 2.5 match
3 3.0 match
4 9.0 match
5 9.5 match
6 10.0 match
7 11.0 not
8 12.0 not
'''
x = df.time.values
tmp = (x[1:] - x[:-1]) < 1
fst = tmp[0]
lst = tmp[-1]
mid = np.any([tmp[1:],tmp[:-1]],axis =0)
ans = np.concatenate([[fst],mid,[lst]])
df = df[ans]
''' Output
time val
1 2.0 match
2 2.5 match
3 3.0 match
4 9.0 match
5 9.5 match
6 10.0 match
'''
Post a Comment for "Pandas Select Rows When Column Value Within Range From Another Row Column Value"