How To Extract A Specific Digit In Each Row Of A Pandas Series Containing Text
I have a pd.Series looks like as follows O some texts...final exam marks:50 next level:10 1 some texts....final exam marks he has got:54 next level:15 2 some texts...f
Solution 1:
Try
s.str.extract('.*marks:\s?(\d+)', expand = False)
0 50
1 54
2 45
With the update:
s.str.extract('.*marks.*?(\d+)', expand = False)
This regex considers the fact that there may or may not be a character after marks
You get
0 50
1 54
2 45
Solution 2:
You need look behind syntax (?<=), which asserts a desired pattern is preceded by another pattern, (?<=marks:) *([0-9]+)
extract digits after the word marks: followed by optional spaces:
s
#0 some texts...final exam marks:50 next lev...
#1 some texts....final exam marks:54 next le...
#2 some texts...final marks: 45 next best le...
#Name: 1, dtype: object
s.str.extract("(?<=marks:) *([0-9]+)", expand=False)
#0 50
#1 54
#2 45
#Name: 1, dtype: object
Post a Comment for "How To Extract A Specific Digit In Each Row Of A Pandas Series Containing Text"