Combining Multiple Columns In A DataFrame
I have a DataFrame with 40 columns (columns 0 through 39) and I want to group them four at a time: import numpy as np import pandas as pd df = pd.DataFrame(np.random.binomial(1, 0
Solution 1:
You could select out the columns and sum on the row axis, like this.
df['0-3'] = df.loc[:, 0:3].sum(axis=1)
A couple things to note:
- Summing like this will ignore missing data while
df[0] + df[1] ...
propagates it. Passskipna=False
if you want that behavior. - Not necessarily any performance benefit, may actually be a little slower.
Solution 2:
Here's another way to do it:
new_df = df.transpose()
new_df['Group'] = new_df.index / 4
new_df = new_df.groupby('Group').sum().transpose()
Note that the divide-by operation here is integer division, not floating-point division.
Solution 3:
I don't know if it is the best way to go but I ended up using MultiIndex:
df.columns = pd.MultiIndex.from_product((range(10), range(4)))
new_df = df.groupby(level=0, axis=1).sum()
Update: Probably because of the index, this was faster than the alternatives. The same can be done with df.groupby(df.columns//4, axis=1).sum()
faster if you take into account the time for constructing the index. However, the index change is a one time operation and I update the df and take the sum thousands of times so using a MultiIndex was faster for me.
Solution 4:
Consider a list comprehension:
df = # your data
df_slices = [df.iloc[x:x+4] for x in range(10)]
Or more generally
df_slices = [df.iloc[x:x+4] for x in range(len(df.columns)/4)]
Post a Comment for "Combining Multiple Columns In A DataFrame"