Skip to content Skip to sidebar Skip to footer

Convert One-hot Encoded Data-frame Columns Into One Column

In the pandas data frame, the one-hot encoded vectors are present as columns, i.e: Rows A B C D E 0 0 0 0 1 0 1 0 0 1 0 0 2 0 1 0 0 0 3 0 0

Solution 1:

Try with argmax

#df=df.set_index('Rows')

df['New']=df.values.argmax(1)+1
df
Out[231]: 
      A  B  C  D  E  NewRows000010410010032010002300010441000014000015

Solution 2:

argmaxis the way to go, adding another way using idxmax and get_indexer:

df['New'] = df.columns.get_indexer(df.idxmax(1))+1
#df.idxmax(1).map(df.columns.get_loc)+1print(df)

Rows  AB  C  D  E   New
                    
000010410010032010002300010441000015000015

Solution 3:

Also need suggestion on this that some rows have multiple 1s, how to handle those rows because we can have only one category at a time.

In this case you dot your DataFrame of dummies with an array of all the powers of 2 (based on the number of columns). This ensures that the presence of any unique combination of dummies (A, A+B, A+B+C, B+C, ...) will have a unique category label. (Added a few rows at the bottom to illustrate the unique counting)

df['Category'] = df.dot(2**np.arange(df.shape[1]))

      A  B  C  D  E  Category
Rows                         
0     0  0  0  1  0         8
1     0  0  1  0  0         4
2     0  1  0  0  0         2
3     0  0  0  1  0         8
4     1  0  0  0  0         1
5     0  0  0  0  1        16
6     1  0  0  0  1        17
7     0  1  0  0  1        18
8     1  1  0  0  1        19

Solution 4:

Another readable solution on top of other great solutions provided that works for ANY type of variables in your dataframe:

df['variables'] = np.where(df.values)[1]+1

output:

AB  C  D  E  variables
000010410010032010002300010441000015000015

Post a Comment for "Convert One-hot Encoded Data-frame Columns Into One Column"