ScreamからMysqlに2つ以上のItemを格納する際の問題

この問題は私を狂ったものにしています。私はPrapelines経由でMySQLにスクラップしたアイテムを保存しようとしています。ScreamからMysqlに2つ以上のItemを格納する際の問題

アイテムを1つだけ保存するとできますが、2番目のアイテムを追加するとこの奇妙なエラーが発生します。

Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '), 1)' at line 2

だから私は、上記のエラーを取得し、pipelines.pyの私のコードは次のとおりです。

class DropToDb(object): 
    def __init__(self): 
     self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Test") 
     self.cursor = self.conn.cursor() 

    def process_item(self, item, spider): 
     try: 
      self.cursor.execute(""" 
          INSERT INTO Main (url, domain_id) 
          VALUES (%s, %s) 
        """, (item['url'], item['domain_id'])) 

      self.conn.commit() 


     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 

     return item

、それは以下のように、素晴らしい作品よりも、私は1つのテーブルと項目を削除した場合。

class DropToDb(object): 
    def __init__(self): 
     self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Test") 
     self.cursor = self.conn.cursor() 

    def process_item(self, item, spider): 
     try: 
      self.cursor.execute(""" 
          INSERT INTO Main (url) 
          VALUES (%s) 
        """, (item['url'])) 

      self.conn.commit() 


     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 

     return item

マイScrapyファイルを次のようになります。

if datematch: 
    item['link_title'] = ogtitle 
    item['link_description'] = response.xpath('//meta[@property="og:description"]/@content').extract() 
    item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract(), 
    yield item

あり多くの項目が上記されているが、私はただの例を望んでいました。

誰かがこの問題を解決するのに役立つことができますか？

私のスパイダーファイル：

import scrapy 
import MySQLdb 
from MySQLdb.cursors import SSCursor 
from scrapy.http import Request 
import re 
from Maintoo.items import MaintooSpider2Item 
from scrapy.exceptions import DropItem 
import datetime 
class Maintoospider2Spider(scrapy.Spider): 
    name = "MaintooSpider2" 

    #start_urls = readdomainsfromdb() 

    def start_requests(self): 
     for domain_id, url, id_sitemap_links in readdomainsfromdb(): 
      yield Request(
       url, 
       callback=self.parse, 
       meta={ 
        'domain_id': domain_id, 
        'id_sitemap_links': id_sitemap_links 
       }, 
       errback=self.error 
      ) 

    def error(self): 
     pass 

    def parse(self, response): 
     domainid = response.meta['domain_id'] 
     id_sitemap_links = response.meta['id_sitemap_links'] 
     #updateid(id_sitemap_links) 
     ogtitle = response.xpath('//meta[@property="og:title"]/@content').extract() 
     isporn = response.xpath('//meta[@content="RTA-5042-1996-1400-1577-RTA"]').extract() 
     datematch = re.findall(r'(content="2015|2016")', response.body, re.IGNORECASE | re.DOTALL) 
     item = MaintooSpider2Item() 
     if '/tag/' in response.url: 
      raise DropItem 
     if isporn: 
      updateporn(domainid) 
      raise DropItem 

     if datematch: 
      item['link_title'] = ogtitle 
      item['link_description'] = response.xpath('//meta[@property="og:description"]/@content').extract() 
      item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract() 
      item['link_type'] = response.xpath('//meta[@property="og:type"]/@content').extract() 
      item['link_url'] = response.xpath('//meta[@property="og:url"]/@content').extract() 
      item['link_site_name'] = response.xpath('//meta[@property="og:site_name"]/@content').extract() 
      item['link_article_tag'] = response.xpath('//meta[@property="article:tag"]/@content').extract() 
      item['link_article_section'] = response.xpath('//meta[@property="article:section"]/@content').extract() 
      item['link_article_published_time'] = response.xpath('//meta[@property="article:published_time"]/@content').extract() 
      item['link_meta_keywords'] = response.xpath('//meta[@name="keywords"]/@content').extract() 
      item['link_publisher'] = response.xpath('//meta[@property="article:publisher"]/@content').extract() 
      item['link_article_author'] = response.xpath('//meta[@property="article:author"]/@content').extract() 
      item['link_twitter_card'] = response.xpath('//meta[@name="twitter:card"]/@content').extract() 
      item['link_twitter_description'] = response.xpath('//meta[@name="twitter:description"]/@content').extract() 
      item['link_twitter_title'] = response.xpath('//meta[@name="twitter:title"]/@content').extract() 
      item['link_twitter_image'] = response.xpath('//meta[@name="twitter:image"]/@content').extract() 
      item['link_facebook_app_id'] = response.xpath('//meta[@property="fb:app_id"]/@content').extract() 
      item['link_facebook_page_admins'] = response.xpath('//meta[@property="fb:admins"]/@content').extract() 
      item['link_rss'] = response.xpath('//meta[@rel="alternate"]/@href').extract() 
      item['link_twitter_image_source'] = response.xpath('//meta[@name="twitter:image:src"]/@content').extract() 
      item['link_twitter_site'] = response.xpath('//meta[@name="twitter:site"]/@content').extract() 
      item['link_twitter_url'] = response.xpath('//meta[@name="twitter:url"]/@content').extract() 
      item['link_twitter_creator'] = response.xpath('//meta[@name="twitter:creator"]/@content').extract() 
      item['link_apple_app'] = response.xpath('//meta[@name="apple-itunes-app"]/@content').extract() 
      item['link_facebook_video'] = response.xpath('//meta[@property="og:video"]/@content').extract() 
      item['link_facebook_page_id'] = response.xpath('//meta[@name="fb:page_id"]/@content').extract() 
      item['link_id'] = response.xpath('//link[@rel="publisher"]/@href').extract() 
      item['link_image'] = response.xpath('//meta[@property="og:image"]/@content').extract() 
      item['url'] = response.url 
      item['domain_id'] = domainid 
      item['crawled_date'] = datetime.datetime.now().isoformat() 
      yield item

私の新しいパイプラインファイル：

class dropifdescription(object): 

    def process_item(self, item, spider): 

     # to test if only "job_id" is empty, 
     # change to: 
     # if not(item["job_id"]): 
     if not(item["link_title"]): 
      raise DropItem() 
     else: 
      return item 

class DropToDb(object): 
    def __init__(self): 
     self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Maintoo", charset="utf8", use_unicode=True) 
     self.cursor = self.conn.cursor() 

    def process_item(self, item, spider): 
     try: 
      self.cursor.execute(""" 
           INSERT INTO Main (url, domain_id, link_title) VALUES (%s, %s, %s)""", (item['url'], item['domain_id'], item['link_title'])) 

      self.conn.commit() 


     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 

     return item

マイセッティングファイル：

ITEM_PIPELINES = { 
    'Maintoo.pipelines.dropifdescription': 200, 
    'Maintoo.pipelines.DropToDb': 300, 
}

出典

2016-04-25 Marketingexpert

は問題があなたのクモの内部から来ています。

item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract(),

最後に、この,を参照してください - これは最終的にあなたのSQLクエリを壊し、あなたのitem['link_locale']タプルを、作っています。カンマを削除します。

さらに、リストの代わりに単一の値を抽出するのに、extract()を通常使用するのではなく、extract_first()を使用する必要があります。

出典

2016-04-25 17:21:48 alecxe

あなたの答えは私が2つの項目の問題を解決しました。もう1つ追加すると問題は同じです。（％s、％s、％s） ""、 "％s、％s、％s"、 "％s、％s、％s、％s、％s、％s、％s、％s、％s）私のコードで何が起こっているか（アイテム[ 'URL']、アイテム[ 'domain_idに']、アイテム[ 'link_title'] ）） self.conn.commit（） ...です今問題？ – Marketingexpert

@BesnikHajrediniあなたは完全なスパイダーを投稿できますか（質問を編集して貼り付けてください）？ありがとう。 – alecxe

これまでの完全なスパイダーファイルと設定ファイルを追加しました。私はあなたが私がついているこのcuzで私を助けてくれることを願っています:) – Marketingexpert

ScreamからMysqlに2つ以上のItemを格納する際の問題

答えて

関連する問題