このXPathからリンクテキストを取得するには？ PythonのライブラリScrapyを使用して

、私は次のようにします、しかしこのXPathからリンクテキストを取得するには？ PythonのライブラリScrapyを使用して

response.xpath('//div[@class="title-and-desc"]/a')

のみリンクがされています。

scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"

そこから私は、個々のリンク+各返された項目のテキストを取得したいのですが返され、テキストは返されません。ここで返されているもののサンプルです：

response.xpath('//div[@class="title-and-desc"]/a') 
[<Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://www.brpr'>, <Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://www.dive'>, <Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://rhodesmi'>,

私はできiは、各反復するための変数である上記の結果、ループスルー：

i.xpath("text()").extract_first(), 
i.xpath("@href").extract_first()

しかし、唯一の@href値が返されます。これは、text()が検索結果を取得するためのものがないためです。変更する必要があるので、付随するリンクテキストも入手できますか？

参照のため、完全なScrapyの例はここから来ています：Scrapy Tutorial Example。

出典

2016-11-28 4thSpace

あなたが探しているテキストは、子ノードdivであるので、それはです：

<div class="title-and-desc"> 
    <a target="_blank" href="http://www.network-theory.co.uk/python/intro/"> 
    <div class="site-title">An Introduction to Python </div> 
    </a> 
<div>

あなたはすなわち、それに//を付加することで（それのテキストは子供と）ノードのすべてのテキストを取得することができます//text()text()の代わりに、または明示的なxpath a/div/text()と一緒に行ってください。

試してください：テキストのみを得ることがURLを取得しないことを意味するので、問題が解決しない

links = response.xpath('//div[@class="title-and-desc"]/a') 
for l in links: 
    # url: 
    print(l.xpath('@href').extract_first()) 
    # text with explicit xpath: 
    print(l.xpath('div/text()').extract_first()) 
    # or with all text elements with relative //text: 
    print(''.join(l.xpath('.//text()').extract()).strip())

出典

2016-11-28 02:29:50 Granitosaurus

。私は 'i.xpath（" // text（） "）。extract_first（）'を試しましたが、うまくいかなかった。 – 4thSpace

@ 4thSpaceは動作しますが、私の編集例を見てください。 – Granitosaurus

もう一つの便利なオプションは、リンクの中でXPathの 'string（）'または 'normalize-space（）'： 'を使用することです：print（l.xpath（ 'normalize-space（。）'）extract_first xpath（ '@ href'）。extract_first（）） ' –

このXPathからリンクテキストを取得するには？ PythonのライブラリScrapyを使用して

答えて

関連する問題