ウェブは、すべてが他のタグを含むタグをwithing取得...こする

Iウェブは、すべてが他のタグを含むタグをwithing取得...こする

<div class="example"> 
    <p> text <a href="#"> link </a> text</p> 
</div>

は、私はクラスの例でのdiv内

<p> text <a href="#"> link </a> text</p>

ので、すべてを取得したい、次のタグを持っています。私は私がその後、

description = ' '.join('<p>{0}</p>'.format(paragraph) for paragraph in description)

と一緒に参加したが、直接のdiv内のコンテンツを取得する方法がなければならない段落タグのリストを与える

from lxml import html 
page = requests.get('X') 
tree = html.fromstring(page.content) 

description = tree.xpath('//div[@class="example"]/p//text()')

を使用していますか？おかげカール

出典

2016-07-10 carl

あなただけのタグ内のすべてのノードを取得する必要があり

dummy = tree.xpath('//div[@class="example"]/div[2]/div/node()') 
description = '' 
for paragraph in dummy: 
    try: 
     description += html.tostring(paragraph) 
    except: 
     pass

出典

2016-07-10 21:04:27 carl

...私は解決策...ないきれいを見つけたが、それは私が欲しいものを私に与えます：

h = """<div class="example"> 
<p> text <a href="#"> link </a> text</p> 
<p> othertext <a href="#"> otherlink </a> text</p> 
</div>""" 

from lxml import html 

x = html.fromstring(h) 

print("".join(html.tostring(n) for n in x.xpath("//div[@class='example']/*")))

出力：

<p> text <a href="#"> link </a> text</p> 
<p> othertext <a href="#"> otherlink </a> text</p>

または使用.iterchildren：

"".join(html.tostring(n) for n in x.xpath("//div[@class='example']")[0].iterchildren())

いずれのtry/exceptも必要ありません。

出典

2016-07-10 21:39:57

ウェブは、すべてが他のタグを含むタグをwithing取得...こする

答えて

関連する問題