Scrapy Encounters Debug: Crawled (400)
I'm trying to scrape the page 'https://zhuanlan.zhihu.com/wangzhenotes' with Scrapy. I run this command scrapy shell 'https://zhuanlan.zhihu.com/wangzhenotes' and got DEBUG: Craw
Solution 1:
Add this middlewire to the middleware.py
file -
classCustomMiddleware(object):defprocess_request(self, request, spider):
request.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
then replace all the previous middlewares with the new one, like this.
DOWNLOADER_MIDDLEWARES = {
'projectname.middlewares.CustomMiddleware': 543,
}
no longer need this -
DEFAULT_REQUEST_HEADERS = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
Post a Comment for "Scrapy Encounters Debug: Crawled (400)"