Scrapy - javascriptで生成されたコンテンツをスクラップする方法は？

私はそれがうまく機能しているが、私は改ページは、その上でどのように動作するかを見つけ出すことはできません、タイトル、価格、説明、などをこすりすることができクモを作成しhttp://www.head-fi.org/f/6550/headphones-for-sale-trade Scrapy - javascriptで生成されたコンテンツをスクラップする方法は？

にいくつかの広告をこすりしようとしています特定のWebサイト。私はそれがjavascriptで生成されていると思いますか？ URLは変更されないため

これは、それは（私は1つのエントリを含めました）このような何か

{"img": ["http://cdn.head-fi.org/9/92/80x80px-ZC-9228072e_image.jpeg"], "title": ["Hifiman HE1000 Mint"], "saletype": ["For Sale"], "price": ["$2,000"], "currency": ["(USD)"], "link": ["/t/819200/hifiman-he1000-mint"]},

を返す最初のページに

from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.selector import HtmlXPathSelector 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from headfi_headphones.items import HeadfiHeadphonesItem 

class MySpider(CrawlSpider): 
    name = "headfiheadphones" 
    allowed_domains = ["head-fi.org"] 
    start_urls = ["http://www.head-fi.org/f/6550/headphones-for-sale-trade"] 

    #rules = (
    # Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=("//a[@class='tooltip']",)), callback="parse_items", follow= True), 
    #) 

def parse(self, response): 
    hxs = HtmlXPathSelector(response) 
    titles = hxs.xpath("//tr[@class='thread']") 
    items = [] 
    for title in titles: 
     item = HeadfiHeadphonesItem() 
     item["title"] = title.select("td[@class='thread-col']/div[@class='shazam']/div[@class='thumbnail_body']/a[@class='classified-title']/text()").extract() 
     item["link"] = title.select("td[@class='thread-col']/div[@class='shazam']/div[@class='thumbnail_body']/a[@class='classified-title']/@href").extract() 
     item["img"] = title.select("td[@class='thread-col']/div[@class='shazam']/div[@class='thumbnail']/a[@class='thumb']/img/@src").extract() 
     item["saletype"] = title.select("td/strong/text()").extract() 
     item["price"] = title.select("td/div[@class='price']/span[@class='ctx-price']/text()").extract() 
     item["currency"] = title.select("td/div[@class='price']/span[@class='currency']/text()").extract() 
     items.append(item) 
    return items

をこすると私のコードでは、各ページをこすりする方法はあります（ 1-80程度）は、私が想定しているものでテーブルに取り込まれているJavaScriptですか？

出典

2017-01-29 Jayndoodle

JavaScriptを適切に解析するには、seleniumを使用することを検討する必要があります。このパッケージはhttps://pypi.python.org/pypi/seleniumから入手できます。

出典

2017-01-30 00:48:43

Scrapy - javascriptで生成されたコンテンツをスクラップする方法は？

答えて

関連する問題