How To Iterate Over Dataframe And Generate A New Dataframe
I have a data frame looks like this: P Q L 1 2 3 2 3 4 5 6,7 The objective is to check if there is any value in L, if yes, extract the value on L and P column: P L 1 3 4,6 4,7 N
Solution 1:
First, you can extract all rows of the L
and P
columns where L
is not missing like so:
df2 = df[~pd.isnull(df.L)].loc[:, ['P', 'L']].set_index('P')
Next, you can deal with the multiple values in some of the remaining L
rows as follows:
df2 = df2.L.str.split(',', expand=True).stack()
df2 = df2.reset_index().drop('level_1', axis=1).rename(columns={0: 'L'}).dropna()
df2.L = df2.L.str.strip()
To explain: with P
as index
, the code splits the string
content of the L
column on ','
and distributes the individual elements across various columns. It then stacks the various new columns into a single new column, and cleans up the result.
Solution 2:
First I extract multiple values of column L
to new dataframe s
with duplicity index from original index. Remove unnecessary columns L
and Q
. Then output join to original df
and drop rows with NaN
values.
printdf
P Q L
0 1 2 3
1 2 3 NaN
2 4 5 6,7
s = df['L'].str.split(',').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'L'print s
0 3
2 6
2 7
Name: L, dtype: object
df = df.drop( ['L', 'Q'], axis=1)
df = df.join(s)
printdf
P L
0 1 3
1 2 NaN
2 4 6
2 4 7
df = df.dropna().reset_index(drop=True)
printdf
P L
0 1 3
1 4 6
2 4 7
Post a Comment for "How To Iterate Over Dataframe And Generate A New Dataframe"