2016-10-28 17 views
0

Chrome Xpath Helperのhttp://tieba.baidu.com/f?kw=dota2&fr=indexに正しいリンクが表示されます。Img Scrapyに正しいxpathの結果がありません

> E:\ladder\tieba\tieba\spiders\tiebaSpiber.py:11: ScrapyDeprecationWarning: tieba.spiders.tiebaSpiber.tiebaSpider inherits from deprecated class scrapy.spiders.BaseSpider, please inherit from scrapy.spiders.Spider. (warning only on first subclass, there may be others) 
    class tiebaSpider(BaseSpider): 
img_url: 
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] 

スパイダーコード: しかしscrapyのクモでは、このログのように何の結果を持っていない

class tiebaSpider(BaseSpider): 
    name = "tiebaSpider" 
    allowed_domains = ["tieba.baidu.com"] 
    download_delay = 1 
    start_urls = ["http://tieba.baidu.com/f?ie=utf-8&kw=dota2", ] 

    rules = (
     Rule(LinkExtractor(allow=(r'http://tieba.baidu.com/f?kw=dota2&ie=utf-8&pn=')), callback='parse_tieba', 
      follow=True), 
    ) 

    def parse_tieba(self, response): 
     self.log("Fetch Dota2 Tieba Page:%s" % response.url) 
     sel = Selector(response) 

     rep_num = sel.xpath('//span[@class="threadlist_rep_num center_text"]/text()').extract() 
     title = sel.xpath('//div[@class="threadlist_title pull_left j_th_tit "]/a/text()').extract() 
     author = sel.xpath('//span[@class="frs-author-name-wrap"]/a/text()').extract() 
     img_url = sel.xpath('//div[@class="threadlist_text pull_left"]//div[@class="small_wrap j_small_wrap"]//a[@class="thumbnail vpic_wrap"]/img/@src').extract() 

     item = TiebaItem() 
     item['rep_num'] = [n for n in rep_num] 
     item['title'] = [n for n in title] 
     item['author'] = [n for n in author] 
     item['img_url'] = [n for n in img_url] 

     print("img_url:\n") 
     print(img_url) 
     yield item 

答えて

0

あなたが実際にウェブサーバからHTMLとして受信されている情報を確認した場合、あなたは気づくことのsrc属性

$ scrapy shell 'http://tieba.baidu.com/f?kw=dota2&fr=index' 
2016-10-28 11:13:58 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot) 

2016-10-28 11:14:00 [scrapy] DEBUG: Crawled (200) <GET http://tieba.baidu.com/f?kw=dota2&fr=index> (referer: None) 

>>> print(response.xpath('//div[@class="threadlist_text pull_left"]//div[@class="small_wrap j_small_wrap"]//a[@class="thumbnail vpic_wrap"]').extract_first()) 
<a class="thumbnail vpic_wrap"><img src="" attr="71814" data-original="http://imgsrc.baidu.com/forum/wh%3D135%2C90/sign=d25862d404d79123e0b59c759e0175bb/a92cb751f3deb48f948c9302f81f3a292ff5785e.jpg" bpic="http://imgsrc.baidu.com/forum/pic/item/a92cb751f3deb48f948c9302f81f3a292ff5785e.jpg" class="threadlist_pic j_m_pic "></a> 
>>> 

しかし、あなたはまた、data-original属性は、より多くのINTEに見えることに気づくことができます。<img>タグは空です休憩:

>>> from pprint import pprint 
>>> pprint(response.xpath('//div[@class="threadlist_text pull_left"]//div[@class="small_wrap j_small_wrap"]//a[@class="thumbnail vpic_wrap"]/img/@data-original').extract()) 
[u'http://imgsrc.baidu.com/forum/wh%3D135%2C90/sign=d25862d404d79123e0b59c759e0175bb/a92cb751f3deb48f948c9302f81f3a292ff5785e.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=4909678ffe246b607b5bba7ddbd4237c/9f396e094b36acafd9ddaf2074d98d1000e99c07.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C180%3Bcrop%3D0%2C0%2C90%2C90/sign=6d1bc479d943ad4ba67b4ec9b22e6b97/5c2c493d269759ee89455917bafb43166c22df2f.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90/sign=46c1cc9483d4b31cf0699cb2b7fa1e4f/bd862d2ac65c10385f6f1915ba119313b17e892e.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D91%2C90/sign=de722bda78cf3bc7e855c5e5e02c8391/accf9e18367adab4f396cc9483d4b31c8501e4fe.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90/sign=9549bad85182b2b7a7ca31cd0181f2df/9dc1673e6709c93d44c22c2b973df8dcd000540b.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C160%3Bcrop%3D0%2C0%2C90%2C90/sign=1361b72e751ed21b799c26ec9d42ecf2/caf91f0828381f307dd1ab75a1014c086c06f07c.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C160%3Bcrop%3D0%2C0%2C90%2C90/sign=003bc7ff7bf082022dc799367bd7cadb/0d38256d55fbb2fbee667bce474a20a44423dcf7.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C160%3Bcrop%3D0%2C0%2C90%2C90/sign=c30aaadd546034a829b7b088fb3f7862/c3fdcc0735fae6cd21a688bd07b30f2443a70f35.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=8ff4a1d85182b2b7a7ca31cd0181fada/3857980a19d8bc3e9f853f168a8ba61eaad345b6.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=fd928ccac7fc1e17fdea84387abcc736/5d2188529822720eb2c8d92673cb0a46f31fab3a.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=4cb4bdf006f41bd5da06e0fd61f6b0fe/6410b912c8fcc3cef25793e89a45d688d53f2051.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=f5025694962f07085f502209d90889ac/7ce22c9b033b5bb5bb3e64253ed3d539b400bc52.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D145%2C90/sign=86be70ceb4315c6043c063eeb984e72a/241923c79f3df8dcb740534ac511728b451028c6.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=f5d7eef34934970a47261826a5e6e8f8/c3fdcc0735fae6cd268d8dbd07b30f2443a70f02.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C120%3Bcrop%3D0%2C0%2C90%2C90/sign=2d5511d753b5c9ea62a60beae5158732/08b62ca85edf8db1d2dd5bb80123dd54544e7454.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D136%2C90/sign=24d6709da751f3dec3e7b165a7d8dc26/64983d1f95cad1c81e464470773e6709c83d513a.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C106%3Bcrop%3D0%2C0%2C90%2C90/sign=b985bccac3ef76093c5e91961ef192fc/fc05e51f4134970a853fa8789dcad1c8a6865d6b.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=04db575ec1ea15ce41bbe800862c03c3/edee83504fc2d56282d5e936ef1190ef74c66c65.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=e97ea5f2a0d3fd1f365caa3300621c2f/5df2b318972bd4075e0fe52173899e510db30973.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C110%3Bcrop%3D0%2C0%2C90%2C90/sign=e9f47f005ce736d158468401ab7c7ef3/99c76a8b4710b9129906c722cbfdfc0390452278.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D159%2C90/sign=0034aae273ec54e741b9121f8c01b769/02988a58d109b3debd89c3b4c4bf6c81810a4c09.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D160%2C90/sign=19b0661bf4039245a1e0e90eb1a488fb/d6d442afa40f4bfb21930e820b4f78f0f53618ff.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D160%2C90/sign=06e4f2d0a4af2eddd4a441e8bb202dd0/cdf3a4315c6034a82e323457c31349540b23766e.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=743f9e9fa98b87d65017a3163724190d/348f3d2dd42a2834f3b81aa553b5c9ea14cebf5c.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=a3aea53fc511728b3078842bf8d0f2fb/4ac19282b9014a9084ad6c13a1773912b11beee7.jpg', 
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=e82781cf07b30f2435cfe40af8b9e076/56de63f40ad162d9617d48b219dfa9ec8813cde7.jpg'] 
>>> 

のでimg_url = sel.xpath('//div[@class="threadlist_text pull_left"]//div[@class="small_wrap j_small_wrap"]//a[@class="thumbnail vpic_wrap"]/img/@data-attribute').extract()

を使用してみてください
関連する問題