Skip to content Skip to sidebar Skip to footer

Confused About Running Scrapy From Within A Python Script

Following document, I can run scrapy from a Python script, but I can't get the scrapy result. This is my spider: from scrapy.spider import BaseSpider from scrapy.selector import Ht

Solution 1:

Terminal prints the result because the default log level is set to DEBUG.

When you are running your spider from the script and call log.start(), the default log level is set to INFO.

Just replace:

log.start()

with

log.start(loglevel=log.DEBUG)

UPD:

To get the result as string, you can log everything to a file and then read from it, e.g.:

log.start(logfile="results.log", loglevel=log.DEBUG, crawler=crawler, logstdout=False)

reactor.run()

with open("results.log", "r") as f:
    result = f.read()
print result

Hope that helps.

Solution 2:

I found your question while asking myself the same thing, namely: "How can I get the result?". Since this wasn't answered here I endeavoured to find the answer myself and now that I have I can share it:

items = []
defadd_item(item):
    items.append(item)
dispatcher.connect(add_item, signal=signals.item_passed)

Or for scrapy 0.22 (http://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script) replace the last line of my solution by:

crawler.signals.connect(add_item, signals.item_passed)

My solution is freely adapted from http://www.tryolabs.com/Blog/2011/09/27/calling-scrapy-python-script/.

Solution 3:

in my case, i placed the script file at scrapy project level e.g. if scrapyproject/scrapyproject/spiders then i placed it at scrapyproject/myscript.py

Post a Comment for "Confused About Running Scrapy From Within A Python Script"