Scrape Images From 9gag, Unable To Read Correct Html-page
Im trying to write a script thats gonna scrape 9gag for images and images only. But i have faced a problem which is that my requests or the Beautifulsoup is getting the wrong HTML
Solution 1:
Try extracting the JSON on the page:
import re
import json
# ...
res = requests.get(...)
html = res.content
m = re.search('JSON\.parse\((.*)\);</script>', html)
double_encoded = m.group(1)
encoded = json.loads(double_encoded)
parsed = json.loads(encoded)
images = [p['images']['image700']['url'] for p in parsed['data']['posts']]
print(images)
output:
['https://img-9gag-fun.9cache.com/photo/abY9Wg8_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aLgy4o5_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aE2LVeM_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/amBEGb4_700b.jpg', 'https://img-9gag-fun.9cache.com/photo/aKxrv56_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/a5M8wXN_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aNY6QEv_700b.jpg', 'https://img-9gag-fun.9cache.com/photo/aYY2Deq_700b.jpg', 'https://img-9gag-fun.9cache.com/photo/aQR0AEw_460s.jpg', 'https://img-9gag-fun.9cache.com/photo/aLgy19P_700b.jpg']
Post a Comment for "Scrape Images From 9gag, Unable To Read Correct Html-page"