ネストされたdivをスクラップでスクラップして解析する方法

このgithubページに従って、facebookのネストされたdivを覚えようとしています。ネストされたdivをスクラップでスクラップして解析する方法

parse_info_text_onlyまたはファイル内parse_info_has_imagehttps://github.com/talhashraf/major-scrapy-spiders/blob/master/mss/spiders/facebook_profile.pyは罰金しかしresult_id div要素自体である、私は、ネストされたdiv要素からresult_idを取得しようとしています同様のページを持っているスパン情報

を取得して動作します。

私はスクラップにしようとしていますdiv要素は、2行目にあるので、私は、私はネストされたdivの上からデータ-XTを取得できますか

def parse_profile(self, response): 
     item["BrowseResultsContainer"] = self.parse_info_has_id(response.css('#BrowseResultsContainer')) 
return item 

def parse_info_has_id(self, css_path): 
      text = css_path.xpath('div/div').extract() 
      text = [t.strip() for t in text] 
      text = [t for t in text if re.search('result_id', t)] 
      return "\n".join(text)

ような何かをしようと理解して何から？

私が思う

出典

2016-11-18 u.gen

：

import json 
... 
    def parse_info_has_id(self, css_path): 
     text = css_path.xpath('div::attr(data-gt)').extract_first() 
     d = json.loads(text) 
     return d['result_id']

出典

2016-11-19 10:41:24 eLRuLL

、CSSを使用したすべてのデータ-XTをしたい場合は、

def parse_info_has_id(self, css_path): 
     text = css_path.xpath('//div[@data-xt != ""]').extract() 
     text = [t.strip() for t in text] 
     text = [t for t in text if re.search('result_id', t)] 
     return "\n".join(text)

出典

2016-11-19 10:32:10

ネストされたdivをスクラップでスクラップして解析する方法

答えて

関連する問題