2012-05-05 5 views
2

私は比較的新しいPythonを使用していますので、助けてください。スクリプトから治療を実行するには、それを理解する助けが必要です

私はScrapyスパイダーを実行するスクリプトを作成しようとしています。 は、これまでのところ私は、私は論理的に間違って何かをやって実現

ERROR: An unexpected error occurred while tokenizing input 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (157, 0)) 
... 
C:\Python27\lib\site-packages\twisted\internet\win32eventreactor.py:64: UserWarn 
ing: Reliable disconnection notification requires pywin32 215 or later 
    category=UserWarning) 
ERROR: An unexpected error occurred while tokenizing input 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (157, 0)) 
Traceback (most recent call last): 
    File "<string>", line 1, in <module> 
    File "C:\Python27\lib\multiprocessing\forking.py", line 374, in main 
    self = load(from_parent) 
    File "C:\Python27\lib\pickle.py", line 1378, in load 
ERROR: An unexpected error occurred while tokenizing input 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (157, 0)) 
    return Unpickler(file).load() 
    File "C:\Python27\lib\pickle.py", line 858, in load 
    dispatch[key](self) 
    File "C:\Python27\lib\pickle.py", line 1090, in load_global 
    klass = self.find_class(module, name) 
    File "C:\Python27\lib\pickle.py", line 1124, in find_class 
    __import__(module) 
    File "Webscrap.py", line 53, in <module> 
    class CrawlerWorker(Process): 
NameError: name 'Process' is not defined 
ERROR: An unexpected error occurred while tokenizing input 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (157, 0)) 
... 
"PicklingError: <function remove at 0x07871CB0>: Can't pickle <function remove at 0x077F6BF0>: it's not found as weakref.remove". 

、以下このエラーが出る

from scrapy.contrib.loader import XPathItemLoader 
from scrapy.item import Item, Field 
from scrapy.selector import HtmlXPathSelector 
from scrapy.spider import BaseSpider 
from scrapy.crawler import CrawlerProcess 


class QuestionItem(Item): 
"""Our SO Question Item""" 
    title = Field() 
    summary = Field() 
    tags = Field() 

    user = Field() 
    posted = Field() 

    votes = Field() 
    answers = Field() 
    views = Field() 


class MySpider(BaseSpider): 
    """Our ad-hoc spider""" 
    name = "myspider" 
    start_urls = ["http://stackoverflow.com/"] 

    question_list_xpath = '//div[@id="content"]//div[contains(@class, "question- summary")]' 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 

     for qxs in hxs.select(self.question_list_xpath): 
      loader = XPathItemLoader(QuestionItem(), selector=qxs) 
      loader.add_xpath('title', './/h3/a/text()') 
      loader.add_xpath('summary', './/h3/a/@title') 
      loader.add_xpath('tags', './/a[@rel="tag"]/text()') 
      loader.add_xpath('user', './/div[@class="started"]/a[2]/text()') 
      loader.add_xpath('posted', './/div[@class="started"]/a[1]/span/@title') 
      loader.add_xpath('votes', './/div[@class="votes"]/div[1]/text()') 
      loader.add_xpath('answers', './/div[contains(@class, "answered")]/div[1]/text()') 
      loader.add_xpath('views', './/div[@class="views"]/div[1]/text()') 

      yield loader.load_item()  

class CrawlerWorker(Process): 
    def __init__(self, spider, results): 
     Process.__init__(self) 
     self.results = results 

     self.crawler = CrawlerProcess(settings) 
     if not hasattr(project, 'crawler'): 
      self.crawler.install() 
     self.crawler.configure() 

     self.items = [] 
     self.spider = spider 
     dispatcher.connect(self._item_passed, signals.item_passed) 

    def _item_passed(self, item): 
     self.items.append(item) 

    def run(self): 
     self.crawler.crawl(self.spider) 
     self.crawler.start() 
     self.crawler.stop() 
     self.results.put(self.items) 

def main(): 
results = Queue() 
crawler = CrawlerWorker(MySpider(BaseSpider), results) 
crawler.start() 
for item in results.get(): 
    pass # Do something with item 

、以下のコードを持っています。私はそれを見つけられません。誰も私にこのコードを実行させる助けを与えることができますか?

最終的には、実行するスクリプト、必要なデータをスクラップしてデータベースに保存するスクリプトが必要ですが、最初はスクレイピング作業をしたいだけです。私はこれがそれを実行すると思ったが、これまでのところ運がない。

答えて

0

あなたはスタンドアロンのスパイダー/クローラーが必要だと思っています...私はカスタムProcessを使用していませんが、それは実際は非常に簡単です。

class StandAloneSpider(CyledgeSpider): 
    #a regular spider 

settings.overrides['LOG_ENABLED'] = True 
#more settings can be changed... 

crawler = CrawlerProcess(settings) 
crawler.install() 
crawler.configure() 

spider = StandAloneSpider() 

crawler.crawl(spider) 
crawler.start()