2017-05-19 23 views
0

ヘッダーと本文を渡すページをスクラップしようとすると、以下のエラーが表示されます。ヘッダーを実装する際にエラーが発生しました。Scrapy Spiderのボディ

私はjson、strに変換して送信しようとしましたが、結果は得られません。辞書を文字列に変換されている場合

コード

import scrapy 

class TestingSpider(scrapy.Spider): 
    name = "test" 

    def start_requests(self): 

     request_headers = { 
      "Host": "host_here", 
      "User-Agent": "Mozilla/5.0 20100101 Firefox/46.0", 
      "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
      "Accept-Language": "en-US,en;q=0.5", 
      "Accept-Encoding": "gzip, deflate", 
      "Connection": "keep-alive", 
      "Cache-Control": "max-age=0" 
     } 

     url = "my_url_here" 

     payload = { 
      "searchargs.approvedFrom.input": "05/18/2017", 
      "searchargs.approvedTO.input": "05/18/2017" 
      "pagesize": -1 
     } 

     yield scrapy.Request(url, method="POST", callback=self.parse, headers=request_headers, body=payload) 


    def parse(self, response): 
     print("-------------------------------came here-------------------------------") 
     print(response.body) 

エラー1

Traceback (most recent call last): 
    File "/home/suventure/home/python/lib/python3.5/site-packages/scrapy/core/engine.py", line 127, in _next_request 
    request = next(slot.start_requests) 
    File "/home/suventure/Desktop/suventure-projects/python-projects/scraper_txrrc/scraper_txrrc/spiders/wells_spider.py", line 114, in start_requests 
    yield scrapy.Request(url, method="POST", callback=self.parse, headers=request_headers, body=payload) 
    File "/home/suventure/home/python/lib/python3.5/site-packages/scrapy/http/request/__init__.py", line 26, in __init__ 
    self._set_body(body) 
    File "/home/suventure/home/python/lib/python3.5/site-packages/scrapy/http/request/__init__.py", line 68, in _set_body 
    self._body = to_bytes(body, self.encoding) 
    File "/home/suventure/home/python/lib/python3.5/site-packages/scrapy/utils/python.py", line 117, in to_bytes 
    'object, got %s' % type(text).__name__) 
TypeError: to_bytes must receive a unicode, str or bytes object, got dict 

エラー2任意の応答なし...何を変更する必要があるなら、私に教えてください体内送付

2017-05-19 22:39:38 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scraper_) 
2017-05-19 22:39:38 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'scraper', 'NEWSPIDER_MODULE': 'scraper_.spiders', 'SPIDER_MODULES': ['scraper_.spiders'], 'ROBOTSTXT_OBEY': True} 
2017-05-19 22:39:39 [scrapy.middleware] INFO: Enabled extensions: 
['scrapy.extensions.telnet.TelnetConsole', 
'scrapy.extensions.corestats.CoreStats', 
'scrapy.extensions.logstats.LogStats'] 
2017-05-19 22:39:39 [scrapy.middleware] INFO: Enabled downloader middlewares: 
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 
'scrapy.downloadermiddlewares.retry.RetryMiddleware', 
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 
'scrapy.downloadermiddlewares.stats.DownloaderStats'] 
2017-05-19 22:39:39 [scrapy.middleware] INFO: Enabled spider middlewares: 
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 
'scrapy.spidermiddlewares.referer.RefererMiddleware', 
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 
'scrapy.spidermiddlewares.depth.DepthMiddleware'] 
2017-05-19 22:39:39 [scrapy.middleware] INFO: Enabled item pipelines: 
['scrapy.pipelines.files.FilesPipeline'] 
2017-05-19 22:39:39 [scrapy.core.engine] INFO: Spider opened 
2017-05-19 22:39:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2017-05-19 22:39:39 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 
2017-05-19 22:39:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://website_link_here/robots.txt> (referer: None) 
2017-05-19 22:39:40 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <POST website_link_here> 
2017-05-19 22:39:40 [scrapy.core.engine] INFO: Closing spider (finished) 
2017-05-19 22:39:40 [scrapy.statscollectors] INFO: Dumping Scrapy stats: 
{'downloader/exception_count': 1, 
'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1, 
'downloader/request_bytes': 232, 
'downloader/request_count': 1, 
'downloader/request_method_count/GET': 1, 
'downloader/response_bytes': 258, 
'downloader/response_count': 1, 
'downloader/response_status_count/200': 1, 
'finish_reason': 'finished', 
'finish_time': datetime.datetime(2017, 5, 19, 17, 9, 40, 581949), 
'log_count/DEBUG': 3, 
'log_count/INFO': 7, 
'response_received_count': 1, 
'scheduler/dequeued': 1, 
'scheduler/dequeued/memory': 1, 
'scheduler/enqueued': 1, 
'scheduler/enqueued/memory': 1, 
'start_time': datetime.datetime(2017, 5, 19, 17, 9, 39, 332675)} 
2017-05-19 22:39:40 [scrapy.core.engine] INFO: Spider closed (finished) 
+1

エラーメッセージが身体パラメータはdictの(あなたのペイロード)ですが、それはUnicodeまたはSTRでなければならないことを教えてくれ。 RequestのScrapyドキュメントは同じことを示しています。だから、あなたのdictを 'request_body = json.dumps(payload)'のようにUnicodeに変換する必要があります。 –

+0

@FrankMartinこのコードは、stackoverflowで言及されているので、このコードで試してみました。エラーが発生していません – Sharath

+0

'scrapy crawl test'コマンドの出力をあなたの質問に追加できますか? –

答えて

1
settings.pyの変化で

ROBOTSTXT_OBEY = False 
+0

結果は得られますが、正確には – Sharath

+0

ROBOTSTXT_OBEY = Falseの場合、scrapyはrobots.txtのセクションを無視します – Verz1Lka

+0

Thanks @ Verz1Lka – Sharath

関連する問題