Scrapyの結果を特殊なJSON形式にエクスポートする方法は？

私はScrapyを使用してStackOverflow.comをクロールしてスクラップします。これは、so.pyScrapyの結果を特殊なJSON形式にエクスポートする方法は？

import scrapy 

class StackOverflowSpider(scrapy.Spider): 
    name = 'stackoverflow' 
    start_urls = ['http://stackoverflow.com'] 

    def parse(self, response): 
     for href in response.css('.question-summary h3 a::attr(href)'): 
      full_url = response.urljoin(href.extract()) 
      yield scrapy.Request(full_url, callback=self.parse_question) 

    def parse_question(self, response): 
     yield { 
      'link': response.url, 
     }

期待される結果：so.json（有効なJSON形式）

[ "http://stackoverflow.com/questions/36421917/exponential-number-in-custom-number-format-of-excel", "http://stackoverflow.com/questions/36421343/can-not-install-requirements-txt", "http://stackoverflow.com/questions/36418815/difference-between-two-approaches-to-pass-parameters-to-web-server", "http://stackoverflow.com/questions/36421743/sharing-an-oracle-database-connection-between-simultaneous-celery-tasks", "http://stackoverflow.com/questions/36421941/jquery-add-css-style", ]

は、次に実行します。
は
scrapy runspider so.py -o so.json

結果ではありません上記のような期待。私はここで立ち往生した。

出典

2016-04-05 Do Nhu Vy

FEED_FORMAT=jsonlinesの設定を使用してください。あなたは

[ 
    "https://stackoverflow.com/questions/36421917/exponential-number-in-custom-number-format-of-excel", 
    "https://stackoverflow.com/questions/36421343/can-not-install-requirements-txt", 
    "https://stackoverflow.com/questions/36418815/difference-between-two-approaches-to-pass-parameters-to-web-server", 
    "https://stackoverflow.com/questions/36421743/sharing-an-oracle-database-connection-between-simultaneous-celery-tasks", 
    "https://stackoverflow.com/questions/36421941/jquery-add-css-style", 
]

あなたがあなた自身のItemExporterを書くべきを取得したい場合は

scrapy runspider so.py -o so.json --set FEED_FORMAT=jsonlines

、see this question

出典

2016-04-05 09:58:33

これで結果は後に上記のコマンドを実行します。https://gist.github.com/donhuvy/7f75e0cf30ab0fe2ba79069ffa328b31それでも私の期待どおりの結果が得られません。 –

答えを訂正してもう一度確認してください。 –

あなたの修正された答えを適用した後、私はコマンドを実行し、私は結果があります：https://gist.github.com/donhuvy/cc21a2a99b64fa367dbaec70f27b564c。期待される結果ではありません。私が問題を解決するのを手伝ってください！ –

Scrapyの結果を特殊なJSON形式にエクスポートする方法は？

答えて

関連する問題