私はScrapyを初めて使い、Scrapyを使っていくつかのリンクをテストとしてクロールしようとしています。私はscrapy crawl tier1
を実行するたびに、私は "TypeError例外を:オブジェクト()はパラメータを受け取りません" を取得、次のとおりです。scrapy TypeError:object()にはパラメータがありません
Traceback (most recent call last):
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse
mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath
self.add_value(field_name, values, *processors, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value
self._add_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value
processed_value = self._process_input_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value
return proc(value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__
next_values += arg_to_iter(func(v))
TypeError: object() takes no parameters
2017-08-23 17:25:02 [tier1-parse-logger] INFO: Entered the parse function to parse and index: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<date>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<author>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166
2017-08-23 17:25:02 [scrapy.core.scraper] ERROR: Spider error processing <GET http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166> (referer: None)
Traceback (most recent call last):
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse
mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath
self.add_value(field_name, values, *processors, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value
self._add_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value
processed_value = self._process_input_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value
return proc(value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__
next_values += arg_to_iter(func(v))
TypeError: object() takes no parameters
そして、私のクモファイル(tier1_crawler.py):
そして、私の項目を.pyファイル:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.loader.processors import Join, MapCompose, TakeFirst
from w3lib.html import remove_tags
def filter_date(value):
if isinstance(value, unicode):
(year, month, day) = str(value.split(" ")[-2]).split(".")
return year+"-"+month+"-"+day
def filter_utf(value):
if isinstance(value, unicode):
return value.encode('utf-8')
class AdvCrawlerItem(scrapy.Item):
author = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # Name of the publisher/author
content = scrapy.Field(input_processor=MapCompose(remove_tags, Join, filter_utf),) # Content of the article (entire contents)
content_type = scrapy.Field()
date = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_date),)
timestamp = scrapy.Field() # timestamp of when the document is being indexed
title = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # title of the article
url = scrapy.Field() # url of the article
そして、pipelines.pyファイル:
import json
from scrapy import signals
from scrapy.exporters import JsonLinesItemExporter
class AdvCrawlerJsonExportPipeline(object):
def open_spider(self, spider):
self.file = open('crawled-articles1.txt', 'w')
def close_spider(self, spider):
self.file.close()
def process_item(self, item, spider):
line = json.dummps(dict(item)) + "\n"
self.file.write(line)
return item
__init__
クラスのメソッドがまったく定義されていないか、またはパラメータを取り込むように定義されていないと、 "TypeError:object()はパラメータを受け取りません"というエラーが通常はスローされます。
ただし、上記の場合、どうすればエラーを修正できますか?アイテムローダーまたはネストされたアイテムローダーを使用して何か間違っていますか?
それはおそらく 'TakeFirst'と' MapCompose() 'で' Join'のようなプロセッサを使用して行う必要があり、関数の代わりに。 –