Scraping Web Content Using Xpath Won't Work

June 09, 2024 Post a Comment

I'm using xpath to scrape a amazon webpage particular, but it doesn't work. Can any one give me some advice? Here's the link to that page: a link I want to scrape these: 'Fun, cre

Solution 1:

The HTML that I download doesn't match your expectations. Here is the expression that works for me:

tree.xpath('//div[@id="technicalProductFeaturesATF"]/ul/li[1]/text()')

Complete program:

from lxml import html
import requests
from pprint import pprint

url = 'http://www.amazon.co.uk/dp/B009CX5VN2'
page = requests.get(url)
tree = html.fromstring(page.text)
feature_bullets = tree.xpath('//div[@id="technicalProductFeaturesATF"]/ul/li/text()')

pprint(feature_bullets)

Result:

$ python foo.py 
['Fun, credit card-sized prints',
 'LCD film counter and shooting mode display',
 'Camera mounted mirror for self portraits',
 'Powered by CR2 Batteries, Built-in, Automatic electronic flash',
 'Fujifilm Instax Mini 25 + 30 Instax Mini Film']

Python College

Scraping Web Content Using Xpath Won't Work

Solution 1:

Post a Comment for "Scraping Web Content Using Xpath Won't Work"