美しいスープは、hrefの取得方法を

私は、HTMLの以下のスープから（一つだけ<strong>Website:</strong>がページにあります）のhrefを抽出することができるように見えることはできません。美しいスープは、hrefの取得方法を

<div id='id_Website'> 
<strong>Website:</strong> 
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a> 
</div></div><div>

これは何をであるI思想は、この場合に

href = soup.find("strong" ,text=re.compile(r'Website')).next["href"]

出典

2011-09-12 howtodothis

.nextが<strong>タグと<a>タグとの間の空白を含むNavigableStringで動作しなければなりません。また、text=属性は、要素ではなくNavigableStringと一致する属性です。

import re 
from BeautifulSoup import BeautifulSoup 

html = '''<div id='id_Website'> 
<strong>Website:</strong> 
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a> 
</div></div><div>''' 

soup = BeautifulSoup(html) 

for t in soup.findAll(text=re.compile(r'Website:')): 
    # Find the parent of the NavigableString, and see 
    # whether that's a <strong>: 
    s = t.parent 
    if s.name == 'strong': 
     print s.nextSibling.nextSibling['href']

...しかし、それは非常に堅牢ではありません。

次は何をしたい、私は思うん。同封のdivに予測可能なIDがある場合は、そのIDを見つけて、最初に<a>という要素を見つけてください。

出典

2011-09-12 13:39:31

これは私が欲しいものです。ありがとうございました。 IDで検索して次のhref値を取得するにはどうすればよいですか？ – howtodothis

あなたは 'soup.findAll（ 'div'、id = re.compile（ 'Website $'））'のようなものを使って、 'div'sすべてを考慮することができます - 他の例を見ることなく、しかし、それらを選ぶ。 –

美しいスープは、hrefの取得方法を

答えて

関連する問題