美味しいスープで「alt」を抽出する方法

とても美味しい美しいスープを発見しました。私はテキストがある "alt"フィールドを抽出する簡単な方法があるかどうか疑問に思っています。簡単な例は、これはあなたが見つけるオーケストラの異なるセクションの中で美味しいスープで「alt」を抽出する方法

につながる

from bs4 import BeautifulSoup 

html_doc =""" 
<body> 
<p>Among the different sections of the orchestra you will find:</p> 
<p>A <img src="07fg03-violin.jpg" alt="violin" /> in the strings</p> 
<p>A <img src="07fg03-trumpet.jpg" alt="trumpet" /> in the brass</p> 
<p>A <img src="07fg03-woodwinds.jpg" alt="clarinet and saxophone"/> in the woodwinds</p> 
</body> 
""" 
soup = BeautifulSoup(html_doc, 'html.parser') 
print(soup.get_text())

次のようになります。

A真鍮

内の文字列

Aで

木管内A

しかし、私はあなたが見つけるオーケストラの異なるセクションの中

を与えるテキスト抽出、内部の代替フィールドを持っているしたいと思います：文字列

でトランペットで

バイオリン真鍮

木管楽器でクラリネットとサックス

おかげ

出典

2017-04-24 Portland

をプリントアウト：http://stackoverflow.com/questions/2612548/extractingを-an-attribute-value-with-beautifulsoup（この質問の可能な複製） – JacobIRR

この方法を検討してください。

from bs4 import BeautifulSoup 

html_doc =""" 
<body> 
<p>Among the different sections of the orchestra you will find:</p> 
<p>A <img src="07fg03-violin.jpg" alt="violin" /> in the strings</p> 
<p>A <img src="07fg03-trumpet.jpg" alt="trumpet" /> in the brass</p> 
<p>A <img src="07fg03-woodwinds.jpg" alt="clarinet and saxophone"/> in the woodwinds</p> 
</body> 
""" 
soup = BeautifulSoup(html_doc, 'html.parser') 
ptag = soup.find_all('p') # get all tags of type <p> 

for tag in ptag: 
    instrument = tag.find('img') # search for <img> 
    if instrument: # if we found an <img> tag... 
     # ...create a new string with the content of 'alt' in the middle if 'tag.text' 
     temp = tag.text[:2] + instrument['alt'] + tag.text[2:] 
     print(temp) # print 
    else: # if we haven't found an <img> tag we just print 'tag.text' 
     print(tag.text)

出力は

Among the different sections of the orchestra you will find: 
A violin in the strings 
A trumpet in the brass 
A clarinet and saxophone in the woodwinds

ある戦略は次のとおりです。私たちが発見した場合は

はすべて<p>タグ
にこれら<p>タグ
で<img>タグの検索を検索し、 <img>タグを挿入するtag.textにそのalt属性のntentと
それをプリントアウトし、我々は<img>タグが見つからない場合は、単に見てとる

出典

2017-04-24 14:41:00 datell

ありがとう@datell。それはうまく動作します。もう1つの質問。同じ段落に2つの画像がある場合、

オーケストラのさまざまなセクションの中で、

A文字列の violin が見つかります。真鍮

木管楽器

で

で

、それは第二1を抽出しません。同じ段落に2つ以上の "img"についてのアイデアはありますか？ – Portland

a = soup.findAll('img') 

for every in a: 
    print(every['alt'])

これは仕事をします。

1.lineはIMGすべて見つけ（我々は.find を使用するすべての）

や結果の各または手動soup.findAll('img')[0] を経由するループの

print (a.text) 
for eachline in a: 
    print(eachline.text)

単純なテキストのために

soup.findAll('img')[1] ..など

出典

2017-04-24 04:12:38

ありがとう、しかしあなたのコードはバイオリンを返しますトランペットクラリネットとサクソフォン。これは私の質問ではありませんでした。私は元の記事のように、これらのテキストを「適切な場所に」置いておきたいと思います。 – Portland

美味しいスープで「alt」を抽出する方法

答えて

関連する問題