xmlコンテンツからのリンクを解析できません

私はxmlコンテンツと一緒にサイトからのリンクを掻き集めるために、Pythonと組み合わせてスクリプトを作成しました。私はXMLで作業したことがないので、私はどこで間違いをしているのか分かりません。私に回避策を提供してくれてありがとう。ここに私がしようとしているものです：xmlコンテンツからのリンクを解析できません

import requests 
from lxml import html 

response = requests.get("https://drinkup.london/sitemap.xml").text 
tree = html.fromstring(response) 
for item in tree.xpath('//div[@class="expanded"]//span[@class="text"]'): 
    print(item)

のリンクがあり、その中

XMLコンテンツ：

<div xmlns="http://www.w3.org/1999/xhtml" class="collapsible" id="collapsible4"><div class="expanded"><div class="line"><span class="button collapse-button"></span><span class="html-tag">&lt;url&gt;</span></div><div class="collapsible-content"><div class="line"><span class="html-tag">&lt;loc&gt;</span><span class="text">https://drinkup.london/</span><span class="html-tag">&lt;/loc&gt;</span></div></div><div class="line"><span class="html-tag">&lt;/url&gt;</span></div></div><div class="collapsed hidden"><div class="line"><span class="button expand-button"></span><span class="html-tag">&lt;url&gt;</span><span class="text">...</span><span class="html-tag">&lt;/url&gt;</span></div></div></div>

実行時にスローされたエラーは以下の通りである：

value = etree.fromstring(html, parser, **kw) 
    File "src\lxml\lxml.etree.pyx", line 3228, in lxml.etree.fromstring (src\lxml\lxml.etree.c:79593) 
    File "src\lxml\parser.pxi", line 1843, in lxml.etree._parseMemoryDocument (src\lxml\lxml.etree.c:119053) 
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

出典

2017-08-07 SIM

'request' .get'の' text'属性に 'response'変数を割り当てています。これはUnicode文字列、つまりエラーです。 'text'の代わりに' content'属性を使用してください。 – peterfields

スイッチ.content which returns bytes instead of .text which returns unicodeへ：

import requests 
from lxml import html 


response = requests.get("https://drinkup.london/sitemap.xml").content 
tree = html.fromstring(response) 
for item in tree.xpath('//url/loc/text()'): 
    print(item)

固定XPath式にも注意してください。

出典

2017-08-07 20:13:07 alecxe

あなたはただの素晴らしいサークルです。私が困っているときはいつでも、あなたはそこにいます。それは魔法のように働いた。大変ありがとうございました。 – SIM

xmlコンテンツからのリンクを解析できません

答えて

関連する問題