Python: Splitting Trajectories Into Steps
Solution 1:
I think you can use groupby
with apply
and custom function with zip
, for output list of lists in necessary list comprehension:
Notice:
count
function return all no NaN
values, if filtering by length
without NaN better is len
.
#filtering and sorting
filtered = df.groupby('user_id').filter(lambda x: len(x['user_id'])>1)
filtered = filtered.sort_values(by='timestamp')
f = lambda x: [list(a) for a in zip(x[:-1], x[1:])]
df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index()
print (df2)
user_id cluster_labels
011011[[86, 110], [110, 110]]12139671[[89, 125]]23945641[[36, 73], [73, 110], [110, 110]]310024312[[123, 27], [27, 97], [97, 97], [97, 97], [97,...
4 14270422 [[0, 110], [110, 174]]
5 14283758 [[110, 184]]
6 14373703 [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...
Similar solution, filtering is last step by boolean indexing
:
filtered = filtered.sort_values(by='timestamp')
f = lambda x: [list(a) for a in zip(x[:-1], x[1:])]
df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index()
df2 = df2[df2['cluster_labels'].str.len() > 0]
print (df2)
user_id cluster_labels
111011[[86, 110], [110, 110]]22139671[[89, 125]]33945641[[36, 73], [73, 110], [110, 110]]410024312[[123, 27], [27, 97], [97, 97], [97, 97], [97,...
5 14270422 [[0, 110], [110, 174]]
6 14283758 [[110, 184]]
7 14373703 [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...
Solution 2:
If you zip
your trajectory with itself offset by one you get your desired result.
Code:
forid, traj in data.items():
print(id, list([i[0], j[0]] for i, j inzip(traj[:-1], traj[1:])))
Test Data:
data = {
11011: [[86], [110], [110]],
2139671: [[89], [125]],
3945641: [[36], [73], [110], [110]],
10024312: [[123], [27], [97], [97], [97], [110]],
14270422: [[0], [110], [174]],
14283758: [[110], [184]],
14373703: [[35], [97], [97], [97], [17], [58]],
}
Results:
11011[[86, 110], [110, 110]]14373703[[35, 97], [97, 97], [97, 97], [97, 17], [17, 58]]3945641[[36, 73], [73, 110], [110, 110]]14283758[[110, 184]]14270422[[0, 110], [110, 174]]2139671[[89, 125]]10024312[[123, 27], [27, 97], [97, 97], [97, 97], [97, 110]]
Solution 3:
My solution uses the magic of pandas' .apply()
function. I believe this should work (I tested this on your sample data). Notice that I also added an extra data points on the end for the case when there is only a single move, and when there is no move.
# Python3.5
import pandas as pd
# Sample data from post
ids = [11011,2139671,3945641,10024312,14270422,14283758,14317445,14331818,14334591,14373703,10000,100001]
traj = [[[86], [110], [110]],[[89], [125]],[[36], [73], [110], [110]],[[123], [27], [97], [97], [97], [110]],[[0], [110], [174]],[[110], [184]],[[50], [88]],[[0], [22], [36], [131], [131]],[[107], [19]],[[35], [97], [97], [97], [17], [58]],[10],[]]
# Sample frame
df = pd.DataFrame({'user_ids':ids, 'trajectory':traj})
def f(x):
# Creates edges given list of moves
iflen(x) <= 1: return x
s = [x[i]+x[i+1] for i in range(len(x)-1)]
return s
df['edges'] = df['trajectory'].apply(lambda x: f(x))
Output:
print(df['edges'])
edges
0[[86, 110], [110, 110]]1[[89, 125]]2[[36, 73], [73, 110], [110, 110]]3[[123, 27], [27, 97], [97, 97], [97, 97], [97,...
4 [[0, 110], [110, 174]]
5 [[110, 184]]
6 [[50, 88]]
7 [[0, 22], [22, 36], [36, 131], [131, 131]]
8 [[107, 19]]
9 [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...
10 [10]
11 []
As far as where you can put this in your pipeline - just put it right after you get your trajectory
column (whether that's after you load the data, or after you do whatever filtering you require).
Post a Comment for "Python: Splitting Trajectories Into Steps"