Find Data Within Html Tags Using Python
Solution 2:
Make sure to add the tag name along with your search string. This is how you can do that:
from bs4 import BeautifulSoup
htmldoc = """
<tr>
<td>Net Taxes Due</td>
<td class="value-column">$2,370.00</td>
<td class="value-column">$2,408.00</td>
</tr>
"""
soup = BeautifulSoup(htmldoc, "html.parser")
item = soup.find('td',text='Net Taxes Due').find_next_sibling("td")
print(item)
Solution 3:
Your .select()
call is not correct. #
in a selector is used to match an element's ID, not its text contents, so #Net
means to look for an element with id="Net"
. Spaces in a selector mean to look for descendants that match each successive selector. So #Net Taxes Due
searches for something like:
<divid="Net"><taxes><due>...</due></taxes></div>
To search for an element containing a specific string, use .find()
with the string
keyword:
table = soup.find(string="Net Taxes Due")
Solution 4:
Assuming that there's an actual HTML table involved:
<html>
<table>
<tr>
<td>Net Taxes Due</td>
<td class="value-column">$2,370.00</td>
<td class="value-column">$2,408.00</td>
</tr>
</table>
</html>
soup = BeautifulSoup(url, "html.parser")
table = soup.find('tr')
df = [x.text for x in table.findAll('td', {'class':'value-column'})]
Solution 5:
These should work. If you are using bs4 4.7.0, you "could" use select. But if you are on an older version, or just prefer the find interface, you can use that. Basically as stated earlier, you cannot reference content with #
, that is an ID.
import bs4
markup = """
<td>Net Taxes Due</td>
<td class="value-column">$2,370.00</td>
<td class="value-column">$2,408.00</td>
"""# Version 4.7.0
soup = bs4.BeautifulSoup(markup, "html.parser")
cells = soup.select('td:contains("Net Taxes Due") ~ td.value-column')
cells = [ele.text.strip() for ele in cells]
print(cells)
# Version < 4.7.0 or if you prefer find
soup = bs4.BeautifulSoup(markup, "html.parser")
cells = soup.find('td', text="Net Taxes Due").find_next_siblings('td')
cells = [ele.text.strip() for ele in cells]
print(cells)
You would get this
['$2,370.00', '$2,408.00']['$2,370.00', '$2,408.00']
Post a Comment for "Find Data Within Html Tags Using Python"