Skip to content Skip to sidebar Skip to footer

Python: Splitting Trajectories Into Steps

I have trajectories created from moves between clusters such as these: user_id,trajectory 11011,[[[86], [110], [110]] 2139671,[[89], [125]] 3945641,[[36], [73], [110], [110]] 10024

Solution 1:

I think you can use groupby with apply and custom function with zip, for output list of lists in necessary list comprehension:

Notice:

count function return all no NaN values, if filtering by length without NaN better is len.

#filtering and sorting     
filtered = df.groupby('user_id').filter(lambda x: len(x['user_id'])>1)
filtered = filtered.sort_values(by='timestamp')

f = lambda x: [list(a) for a in zip(x[:-1], x[1:])]
df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index()
print (df2)
    user_id                                     cluster_labels
011011[[86, 110], [110, 110]]12139671[[89, 125]]23945641[[36, 73], [73, 110], [110, 110]]310024312[[123, 27], [27, 97], [97, 97], [97, 97], [97,...
4  14270422                             [[0, 110], [110, 174]]
5  14283758                                       [[110, 184]]
6  14373703  [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...

Similar solution, filtering is last step by boolean indexing:

filtered = filtered.sort_values(by='timestamp')

f = lambda x: [list(a) for a in zip(x[:-1], x[1:])]
df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index()
df2 = df2[df2['cluster_labels'].str.len() > 0]
print (df2)
    user_id                                     cluster_labels
111011[[86, 110], [110, 110]]22139671[[89, 125]]33945641[[36, 73], [73, 110], [110, 110]]410024312[[123, 27], [27, 97], [97, 97], [97, 97], [97,...
5  14270422                             [[0, 110], [110, 174]]
6  14283758                                       [[110, 184]]
7  14373703  [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...

Solution 2:

If you zip your trajectory with itself offset by one you get your desired result.

Code:

forid, traj in data.items():
    print(id, list([i[0], j[0]] for i, j inzip(traj[:-1], traj[1:])))

Test Data:

data = {
    11011: [[86], [110], [110]],
    2139671: [[89], [125]],
    3945641: [[36], [73], [110], [110]],
    10024312: [[123], [27], [97], [97], [97], [110]],
    14270422: [[0], [110], [174]],
    14283758: [[110], [184]],
    14373703: [[35], [97], [97], [97], [17], [58]],
}

Results:

11011[[86, 110], [110, 110]]14373703[[35, 97], [97, 97], [97, 97], [97, 17], [17, 58]]3945641[[36, 73], [73, 110], [110, 110]]14283758[[110, 184]]14270422[[0, 110], [110, 174]]2139671[[89, 125]]10024312[[123, 27], [27, 97], [97, 97], [97, 97], [97, 110]]

Solution 3:

My solution uses the magic of pandas' .apply() function. I believe this should work (I tested this on your sample data). Notice that I also added an extra data points on the end for the case when there is only a single move, and when there is no move.

# Python3.5
import pandas as pd 


# Sample data from post
ids = [11011,2139671,3945641,10024312,14270422,14283758,14317445,14331818,14334591,14373703,10000,100001]
traj = [[[86], [110], [110]],[[89], [125]],[[36], [73], [110], [110]],[[123], [27], [97], [97], [97], [110]],[[0], [110], [174]],[[110], [184]],[[50], [88]],[[0], [22], [36], [131], [131]],[[107], [19]],[[35], [97], [97], [97], [17], [58]],[10],[]]

# Sample frame
df = pd.DataFrame({'user_ids':ids, 'trajectory':traj})

def f(x):
    # Creates edges given list of moves
    iflen(x) <= 1: return x
    s = [x[i]+x[i+1] for i in range(len(x)-1)]
    return s

df['edges'] = df['trajectory'].apply(lambda x: f(x))

Output:

print(df['edges'])

                                                edges  
0[[86, 110], [110, 110]]1[[89, 125]]2[[36, 73], [73, 110], [110, 110]]3[[123, 27], [27, 97], [97, 97], [97, 97], [97,...  
4                              [[0, 110], [110, 174]]  
5                                        [[110, 184]]  
6                                          [[50, 88]]  
7          [[0, 22], [22, 36], [36, 131], [131, 131]]  
8                                         [[107, 19]]  
9   [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...  
10                                               [10]  
11                                                 []

As far as where you can put this in your pipeline - just put it right after you get your trajectory column (whether that's after you load the data, or after you do whatever filtering you require).

Post a Comment for "Python: Splitting Trajectories Into Steps"