Confused About Running Scrapy From Within A Python Script
Solution 1:
Terminal prints the result because the default log level is set to DEBUG
.
When you are running your spider from the script and call log.start()
, the default log level is set to INFO
.
Just replace:
log.start()
with
log.start(loglevel=log.DEBUG)
UPD:
To get the result as string, you can log everything to a file and then read from it, e.g.:
log.start(logfile="results.log", loglevel=log.DEBUG, crawler=crawler, logstdout=False)
reactor.run()
with open("results.log", "r") as f:
result = f.read()
print result
Hope that helps.
Solution 2:
I found your question while asking myself the same thing, namely: "How can I get the result?". Since this wasn't answered here I endeavoured to find the answer myself and now that I have I can share it:
items = []
defadd_item(item):
items.append(item)
dispatcher.connect(add_item, signal=signals.item_passed)
Or for scrapy 0.22 (http://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script) replace the last line of my solution by:
crawler.signals.connect(add_item, signals.item_passed)
My solution is freely adapted from http://www.tryolabs.com/Blog/2011/09/27/calling-scrapy-python-script/.
Solution 3:
in my case, i placed the script file at scrapy project level e.g. if scrapyproject/scrapyproject/spiders then i placed it at scrapyproject/myscript.py
Post a Comment for "Confused About Running Scrapy From Within A Python Script"