Skip to content Skip to sidebar Skip to footer

How To Iterate Over Dataframe And Generate A New Dataframe

I have a data frame looks like this: P Q L 1 2 3 2 3 4 5 6,7 The objective is to check if there is any value in L, if yes, extract the value on L and P column: P L 1 3 4,6 4,7 N

Solution 1:

First, you can extract all rows of the L and P columns where L is not missing like so:

df2 = df[~pd.isnull(df.L)].loc[:, ['P', 'L']].set_index('P')

Next, you can deal with the multiple values in some of the remaining L rows as follows:

df2 = df2.L.str.split(',', expand=True).stack()
df2 = df2.reset_index().drop('level_1', axis=1).rename(columns={0: 'L'}).dropna()
df2.L = df2.L.str.strip()

To explain: with P as index, the code splits the string content of the L column on ',' and distributes the individual elements across various columns. It then stacks the various new columns into a single new column, and cleans up the result.

Solution 2:

First I extract multiple values of column L to new dataframe s with duplicity index from original index. Remove unnecessary columns L and Q. Then output join to original df and drop rows with NaN values.

printdf
   P  Q    L
0  1  2    3
1  2  3  NaN
2  4  5  6,7

s = df['L'].str.split(',').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'L'print s
0    3
2    6
2    7
Name: L, dtype: object

df = df.drop( ['L', 'Q'], axis=1)
df = df.join(s)
printdf
   P    L
0  1    3
1  2  NaN
2  4    6
2  4    7
df = df.dropna().reset_index(drop=True)
printdf
   P  L
0  1  3
1  4  6
2  4  7

Post a Comment for "How To Iterate Over Dataframe And Generate A New Dataframe"