Python - Issue Scraping With BeautifulSoup
I'm trying to scrape the Stack Overflow jobs page using Beautiful Soup 4 and URLLIB as a personal project. I'm facing an issue where I'm trying to scrape all the links to the 50 jo
Solution 1:
Disclaimer: I did some asking of my own for a part of this answer.
from bs4 import BeautifulSoup
import requests
import json
# note: link is slightly different; yours just redirects here
link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')
s = soup.find('script', type='application/ld+json')
urls = [el['url'] for el in json.loads(s.text)['itemListElement']]
print(len(urls))
50
Process:
- Use
soup.find
rather thansoup.find_all
. This will give a JSONbs4.element.Tag
json.loads(s.text)
is a nested dict. Access the values foritemListElement
key to get a dict of urls, and convert to list.
Post a Comment for "Python - Issue Scraping With BeautifulSoup"