Skip to content Skip to sidebar Skip to footer

Scrape Images From 9gag, Unable To Read Correct Html-page

Im trying to write a script thats gonna scrape 9gag for images and images only. But i have faced a problem which is that my requests or the Beautifulsoup is getting the wrong HTML

Solution 1:

Try extracting the JSON on the page:

import re
import json

# ...
res = requests.get(...)
html = res.content

m = re.search('JSON\.parse\((.*)\);</script>', html)
double_encoded = m.group(1)
encoded = json.loads(double_encoded)
parsed = json.loads(encoded)

images = [p['images']['image700']['url'] for p in parsed['data']['posts']]
print(images)

output:

['https://img-9gag-fun.9cache.com/photo/abY9Wg8_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aLgy4o5_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aE2LVeM_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/amBEGb4_700b.jpg', 'https://img-9gag-fun.9cache.com/photo/aKxrv56_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/a5M8wXN_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aNY6QEv_700b.jpg', 'https://img-9gag-fun.9cache.com/photo/aYY2Deq_700b.jpg', 'https://img-9gag-fun.9cache.com/photo/aQR0AEw_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aLgy19P_700b.jpg']

Post a Comment for "Scrape Images From 9gag, Unable To Read Correct Html-page"