Skip to content Skip to sidebar Skip to footer

Combining Multiple Columns In A DataFrame

I have a DataFrame with 40 columns (columns 0 through 39) and I want to group them four at a time: import numpy as np import pandas as pd df = pd.DataFrame(np.random.binomial(1, 0

Solution 1:

You could select out the columns and sum on the row axis, like this.

df['0-3'] = df.loc[:, 0:3].sum(axis=1)

A couple things to note:

  1. Summing like this will ignore missing data while df[0] + df[1] ... propagates it. Pass skipna=False if you want that behavior.
  2. Not necessarily any performance benefit, may actually be a little slower.

Solution 2:

Here's another way to do it:

new_df = df.transpose()  
new_df['Group'] = new_df.index / 4  
new_df = new_df.groupby('Group').sum().transpose()

Note that the divide-by operation here is integer division, not floating-point division.


Solution 3:

I don't know if it is the best way to go but I ended up using MultiIndex:

df.columns = pd.MultiIndex.from_product((range(10), range(4)))
new_df = df.groupby(level=0, axis=1).sum()

Update: Probably because of the index, this was faster than the alternatives. The same can be done with df.groupby(df.columns//4, axis=1).sum() faster if you take into account the time for constructing the index. However, the index change is a one time operation and I update the df and take the sum thousands of times so using a MultiIndex was faster for me.


Solution 4:

Consider a list comprehension:

df = # your data
df_slices = [df.iloc[x:x+4] for x in range(10)]

Or more generally

df_slices = [df.iloc[x:x+4] for x in range(len(df.columns)/4)]

Post a Comment for "Combining Multiple Columns In A DataFrame"