Python Scrapy：Xpathが存在しない場合はスキップする

私はこのコードを使用して、数百ページを削っています。しかし、ときどきaのxpathが全く存在しない場合、どうすればこのスクリプトを停止しないで、bを取得してその特定のページに渡すことができますか？Python Scrapy：Xpathが存在しない場合はスキップする

`a = response.xpath("//div[@class='headerDiv']/a/@title").extract()[0] 
b = response.xpath("//div[@class='headerDiv']/text()").extract()[0].strip() 
items['title'] = a + " " + b 
yield items`

出典

2016-10-19 Blue Island

extract()の結果を確認してください。

パドレイクカニンガムの良いアドバイスで

nodes = response.xpath("//div[@class='headerDiv']/a/@title").extract() 
a = nodes[0] if nodes else "" 

nodes = response.xpath("//div[@class='headerDiv']/text()").extract() 
b = nodes[0].strip() if nodes else "" 

items['title'] = a + " " + b 
yield items

：あなたは次のように使用することができます

a = response.xpath("//div[@class='headerDiv']/a/@title").extract_first(default='') 
b = response.xpath("//div[@class='headerDiv']/text()").extract_first(default ='').strip() 
items['title'] = (a + " " + b).strip() 
yield items

出典

2016-10-19 11:43:31

'extract_first（デフォルト= ''）'ができますが、追加しようとしていますスペースが存在しない場合は先頭にスペースがあるので、aが存在する場合はaのみに連結するか、bを追加する必要があります。 –

'（a + '' + b）.strip（）はaが一致を返さない場合にキャッチします。最初の文字だけを取得するように、文字列を返すので、extract_firstのインデックスを作成したくない場合もあります。 –

ありがとうございました！ –

：

import lxml.etree as etree 

parser = etree.XMLParser(strip_cdata=False, remove_comments=True) 
root = etree.fromstring(data, parser) 

#Take Hyperlink as per xpath: 
#But Xpath returns list of element so we have to take 0 index of it if it has element 

a = root.xpath("//div[@class='headerDiv']/a/@title") 
b = response.xpath("//div[@class='headerDiv']/text()") 

if a: 
    items['title'] = a[0].strip() + " " + b[0].strip() 
else: 
    items['title'] = b[0].strip() 

yield items

出典

2016-10-19 12:06:00

Python Scrapy：Xpathが存在しない場合はスキップする

答えて

関連する問題