Scrapy Encounters Debug: Crawled (400)

October 27, 2023 Post a Comment

I'm trying to scrape the page 'https://zhuanlan.zhihu.com/wangzhenotes' with Scrapy. I run this command scrapy shell 'https://zhuanlan.zhihu.com/wangzhenotes' and got DEBUG: Craw

Solution 1:

Add this middlewire to the middleware.py file -

classCustomMiddleware(object):defprocess_request(self, request, spider):
        request.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"

then replace all the previous middlewares with the new one, like this.

DOWNLOADER_MIDDLEWARES = {
    'projectname.middlewares.CustomMiddleware': 543,
}

no longer need this -

DEFAULT_REQUEST_HEADERS = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}

Python College

Scrapy Encounters Debug: Crawled (400)

Solution 1:

Post a Comment for "Scrapy Encounters Debug: Crawled (400)"