2017-08-23 10 views
1

私はScrapyを初めて使い、Scrapyを使っていくつかのリンクをテストとしてクロールしようとしています。私はscrapy crawl tier1を実行するたびに、私は "TypeError例外を:オブジェクト()はパラメータを受け取りません" を取得、次のとおりです。scrapy TypeError:object()にはパラメータがありません

Traceback (most recent call last): 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse 
    mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath 
    self.add_value(field_name, values, *processors, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value 
    self._add_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value 
    processed_value = self._process_input_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value 
    return proc(value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__ 
    next_values += arg_to_iter(func(v)) 
TypeError: object() takes no parameters 
2017-08-23 17:25:02 [tier1-parse-logger] INFO: Entered the parse function to parse and index: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166 
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<date>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166 
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<author>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166 
2017-08-23 17:25:02 [scrapy.core.scraper] ERROR: Spider error processing <GET http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166> (referer: None) 
Traceback (most recent call last): 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse 
    mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath 
    self.add_value(field_name, values, *processors, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value 
    self._add_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value 
    processed_value = self._process_input_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value 
    return proc(value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__ 
    next_values += arg_to_iter(func(v)) 
TypeError: object() takes no parameters 

そして、私のクモファイル(tier1_crawler.py):

そして、私の項目を.pyファイル:

# -*- coding: utf-8 -*- 

import scrapy 
from scrapy.loader.processors import Join, MapCompose, TakeFirst 
from w3lib.html import remove_tags 

def filter_date(value): 
    if isinstance(value, unicode): 
     (year, month, day) = str(value.split(" ")[-2]).split(".") 
     return year+"-"+month+"-"+day 

def filter_utf(value): 
    if isinstance(value, unicode): 
     return value.encode('utf-8') 

class AdvCrawlerItem(scrapy.Item): 
    author = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # Name of the publisher/author 
    content = scrapy.Field(input_processor=MapCompose(remove_tags, Join, filter_utf),) # Content of the article (entire contents) 
    content_type = scrapy.Field() 
    date = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_date),) 
    timestamp = scrapy.Field() # timestamp of when the document is being indexed 
    title = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # title of the article 
    url = scrapy.Field() # url of the article 

そして、pipelines.pyファイル:

import json 
from scrapy import signals 
from scrapy.exporters import JsonLinesItemExporter 

class AdvCrawlerJsonExportPipeline(object): 
    def open_spider(self, spider): 
     self.file = open('crawled-articles1.txt', 'w') 

    def close_spider(self, spider): 
     self.file.close() 

    def process_item(self, item, spider): 
     line = json.dummps(dict(item)) + "\n" 
     self.file.write(line) 
     return item 

__init__クラスのメソッドがまったく定義されていないか、またはパラメータを取り込むように定義されていないと、 "TypeError:object()はパラメータを受け取りません"というエラーが通常はスローされます。

ただし、上記の場合、どうすればエラーを修正できますか?アイテムローダーまたはネストされたアイテムローダーを使用して何か間違っていますか?

+0

それはおそらく 'TakeFirst'と' MapCompose() 'で' Join'のようなプロセッサを使用して行う必要があり、関数の代わりに。 –

答えて

2

scrapyプロセッサを使用するときは、処理を行うオブジェクトを作成するためのクラスを使用する必要がありますに:

# wrong 
field = Field(output_processor=MapCompose(TakeFirst)) 
# right 
field = Field(output_processor=MapCompose(TakeFirst())) 
                ^^ 
+0

ああ、あなたは正しいです。私は愚かな間違いをした。ありがとう。 – btaek

関連する問題