Skip to content Skip to sidebar Skip to footer

Split String From Beautifulsoup Output In A List

I have the following output from my code Code: text = soup.get_text() Output: Article Title Some text: Text blurb. More blurb. Even more blurb. Some more blurb. Seco

Solution 1:

If you take a look at your text, you want to split by repeated newlines \n from

text
>>'Article Title\n\n    Some text: Text blurb.\n\nMore blurb.\n\nEven more blurb. \n\nSome more blurb. \n\n\n\n\n\nSecond Article Title\n\nSome text: Text blurb.\n\nMore blurb.\n\nEven more blurb. \n\nSome more blurb. '

You can then just use define a parameter for text.split('\n\n\n\n\n'), if you don't add a parameter, Python simply splits by whitespaces. After your first split, you can then split your other elements by \n\n.

[i.split('\n\n') for i in text.split('\n\n\n\n\n')]

>>[['Article Title',
  '    Some text: Text blurb.',
  'More blurb.',
  'Even more blurb. ',
  'Some more blurb. '],
 ['\nSecond Article Title',
  'Some text: Text blurb.',
  'More blurb.',
  'Even more blurb. ',
  'Some more blurb. ']]

Post a Comment for "Split String From Beautifulsoup Output In A List"