Skip to content Skip to sidebar Skip to footer

Convert Pandas Column Of Numpy Arrays To Numpy Array Of Higher Dimension

I have a pandas dataframe of shape (75,9). Only one of those columns is of numpy arrays, each of which is of shape (100, 4, 3) I have a strange phenomenon: data = self.df[self.colu

Solution 1:

In [42]: some_df = pd.DataFrame(columns=['A']) 
    ...: for i in range(4): 
    ...:         some_df.loc[i] = [np.random.randint(0,10,(1,3))] 
    ...:                                                                                  
In [43]: some_df                                                                          
Out[43]: 
             A
0[[7, 0, 9]]1[[3, 6, 8]]2[[9, 7, 6]]3[[1, 6, 3]]

The numpy values of the column are an object dtype array, containing arrays:

In [44]: some_df['A'].to_numpy()                                                          
Out[44]: 
array([array([[7, 0, 9]]), array([[3, 6, 8]]), array([[9, 7, 6]]),
       array([[1, 6, 3]])], dtype=object)

If those arrays all have the same shape, stack does a nice job of concatenating them on a new dimension:

In [45]: np.stack(some_df['A'].to_numpy())                                                
Out[45]: 
array([[[7, 0, 9]],

       [[3, 6, 8]],

       [[9, 7, 6]],

       [[1, 6, 3]]])
In [46]: _.shape                                                                          
Out[46]: (4, 1, 3)

This only works with one column. stack like all concatenate treats the input argument as an iterable, effectively a list of arrays.

In [48]: some_df['A'].to_list()                                                           
Out[48]: 
[array([[7, 0, 9]]),
 array([[3, 6, 8]]),
 array([[9, 7, 6]]),
 array([[1, 6, 3]])]
In [50]: np.stack(some_df['A'].to_list()).shape                                           
Out[50]: (4, 1, 3)

Solution 2:

What you're asking for is not quite possible. Pandas DataFrames are 2D. Yes, you can store NumPy arrays as objects (references) inside DataFrame cells, but this is not really well supported, and expecting to get a shape which has one dimension from the DataFrame and two from the arrays inside is not possible at all.

You should consider storing your data either entirely in NumPy arrays of the appropriate shape, or in a single, properly 2D DataFrame with MultiIndex. For example you can "pivot" a column of 1D arrays to become a column of scalars if you move the extra dimension to a new level of a MultIndex on the rows:

A
x [2, 3]
y [5, 6]

becomes:

A
x 0213
y 0516

or pivot to the columns:

A01
x 23
y 56

Post a Comment for "Convert Pandas Column Of Numpy Arrays To Numpy Array Of Higher Dimension"