2012-01-17 10 views
1

私は自分のウェブサイトをクロールするためにscrapyの新しいユーザーです。私はmysqlデータベースにクロールされたデータを保存したいと思います。 myspider.py:私たちはパイプラインでアイテムを使用しています

class MininovaSpider(CrawlSpider): 

    name = 'myspider' 
    allowed_domains = ['example.com'] 
    start_urls = ['http://www.example.com'] 
    rules = [Rule(SgmlLinkExtractor(allow=('/categorie/.*'),restrict_xpaths=('//div[@id="contLeftNavig"]',)), 'parse_t')] 

    def parse_t(self, response): 
     x = HtmlXPathSelector(response) 

     torrent = Torrent() 
     torrent['url'] = response.url 
     torrent['title']=x.select("//h1[@class='infoAneTitre']/text()").extract() 
     torrent['wilaya'] = x.select("//span[@class='ville_t']/text()").extract() 
     #torrent['prix'] = x.select("//div[@id='datail_ann']/ul[1]/li[4]/span/text()").extract() 
     #torrent['surface'] = x.select("//div[@id='datail_ann']/ul[3]/li[1]/span/text()").extract() 
     torrent['description'] = x.select("//div[@class='box_pad']/text()").extract() 
     return torrent 

とpipelines.pyのために、私はクロールを実行したときに、私はこのエラーを取得する修正とgoogldir.Soの例を使用:

  • exceptions.AttributeError: 'MininovaSpider' オブジェクトはありません属性 'iterkeys'
  • exceptions.TypeErrorを持っている: 'MininovaSpider' オブジェクトは、
  • を添字化されていません

pipeline.py:

from scrapy import log 
from twisted.enterprise import adbapi 
import time 
import MySQLdb.cursors 

class Pipeline(object): 
    def __init__(self): 


     self.dbpool = adbapi.ConnectionPool('MySQLdb', 
       db='test', 
       user='root', 
       passwd='', 
       cursorclass=MySQLdb.cursors.DictCursor, 
       charset='utf8', 
       use_unicode=True 
      ) 

    def process_item(self, spider, item): 

     query = self.dbpool.runInteraction(self._conditional_insert, item) 
     query.addErrback(self.handle_error) 

     return item 

    def _conditional_insert(self, tx, item): 


     tx.execute("select * from database where url = %s", (item['url'])) 
     result = tx.fetchone() 
     if result: 
      log.msg("Item already stored in db: %s" % item, level=log.DEBUG) 
     else: 
      tx.execute(\ 
      "insert into database (wilaya,titre, site, lien,resume,timestamp) " 
      "values (%s, %s, %s, %s,%s,%s)", 
      (item['wilaya'], 
      item['title'], 
      'example.com',item['url'],item['description'], 
      time.time()) 
      ) 
      log.msg("Item stored in db: %s" % item, level=log.DEBUG) 

    def handle_error(self, e): 
     log.err(e) 

とトレースバック:

Traceback (most recent call last): 
     File "/usr/lib/python2.7/twisted/internet/defer.py", line 287, in addCallbacks 
     self._runCallbacks() 
     File "/usr/lib/python2.7/twisted/internet/defer.py", line 545, in _runCallbacks 
     current.result = callback(current.result, *args, **kw) 
     File "/usr/lib/python2.7/site-packages/scrapy/core/scraper.py", line 208, in _itemproc_finished 
     item=output, response=response, spider=spider) 
     File "/usr/lib/python2.7/site-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred 
     *arguments, **named) 
    --- <exception caught here> --- 
     File "/usr/lib/python2.7/twisted/internet/defer.py", line 134, in maybeDeferred 
     result = f(*args, **kw) 
     File "/usr/lib/python2.7/site-packages/scrapy/xlib/pydispatch/robustapply.py", line 47, in robustApply 
     return receiver(*arguments, **named) 
     File "/usr/lib/python2.7/site-packages/scrapy/contrib/feedexport.py", line 177, in item_scraped 
     slot.exporter.export_item(item) 
     File "/usr/lib/python2.7/site-packages/scrapy/contrib/exporter/__init__.py", line 109, in export_item 
     itemdict = dict(self._get_serialized_fields(item)) 
     File "/usr/lib/python2.7/site-packages/scrapy/contrib/exporter/__init__.py", line 60, in _get_serialized_fields 
     field_iter = item.iterkeys() 
    **exceptions.AttributeError: 'MininovaSpider' object has no attribute 'iterkeys' 

2012-01-18 16:00:43-0600 [scrapy] Unhandled Error 
    Traceback (most recent call last): 
     File "/usr/lib/python2.7/threading.py", line 503, in __bootstrap 
     self.__bootstrap_inner() 
     File "/usr/lib/python2.7/threading.py", line 530, in __bootstrap_inner 
     self.run() 
     File "/usr/lib/python2.7/threading.py", line 483, in run 
     self.__target(*self.__args, **self.__kwargs) 
    --- <exception caught here> --- 
     File "/usr/lib/python2.7/twisted/python/threadpool.py", line 207, in _worker 
     result = context.call(ctx, function, *args, **kwargs) 
     File "/usr/lib/python2.7/twisted/python/context.py", line 118, in callWithContext 
     return self.currentContext().callWithContext(ctx, func, *args, **kw) 
     File "/usr/lib/python2.7/twisted/python/context.py", line 81, in callWithContext 
     return func(*args,**kw) 
     File "/usr/lib/python2.7/twisted/enterprise/adbapi.py", line 448, in _runInteraction 
     result = interaction(trans, *args, **kw) 
     File "/opt/scrapy/test/pipelines.py", line 33, in _conditional_insert 
     tx.execute("select * from database where url = %s", (item['url'])) 
    **exceptions.TypeError: 'MininovaSpider' object is not subscriptable 
+0

質問は非常に明確ではありません。パイプラインのコードと完全な例外トレースバックを表示できますか? – warvariuc

+0

遅れて申し訳ありません。 – user1151311

答えて

3
exceptions.TypeError: 'MininovaSpider' object is not subscriptable 

あなたがどこかにクモ(MininovaSpider)インスタンスの代わりに、アイテムが得られているように見えます。私はあなたが示していないより多くのコードがあると思います。 Pipeline.process_item()

は、確認のためにこれを置く:

def process_item(self, spider, item): 

    assert isinstance(item, Torrent), 'Here should be Torrent instance!' 

    query = self.dbpool.runInteraction(self._conditional_insert, item) 
    query.addErrback(self.handle_error) 

    return item 
関連する問題