BeautifulSoupで `div`内の` p`からテキストを抽出します

私はPythonでウェブスクレイピングするのがとても新しく、HTML内でネストされたテキストを抽出するのに苦労しています（p内div ）。ここで私はこれまで得たものである：BeautifulSoupで `div`内の` p`からテキストを抽出します

from bs4 import BeautifulSoup 
import urllib 

url = urllib.urlopen('http://meinparlament.diepresse.com/') 
content = url.read() 
soup = BeautifulSoup(content, 'lxml')

これは正常に動作します：

links=soup.findAll('a',{'title':'zur Antwort'}) 
for link in links: 
    print(link['href'])

この抽出は、正常に動作します：

table = soup.findAll('div',attrs={"class":"content-question"}) 
for x in table: 
    print(x)

は、これが出力されます。

<div class="content-question"> 
<p>[...] Die Verhandlungen über die mögliche Visabefreiung für  
türkische Staatsbürger per Ende Ju... 
<a href="http://meinparlament.diepresse.com/frage/10144/" title="zur 
Antwort">mehr »</a> 
</p> 
</div>

今、0123以内にテキストを抽出したいおよび/p。これは私が使用するコードです：

table = soup.findAll('div',attrs={"class":"content-question"}) 
for x in table: 
    print(x['p'])

しかし、PythonはKeyErrorを発生させます。

出典

2016-04-19 Johannes Schwaninger

次のコードは、検出しclass 『コンテンツ質問』とdiv年代における各p要素のテキストを印刷

from bs4 import BeautifulSoup 
import urllib 

url = urllib.urlopen('http://meinparlament.diepresse.com/') 
content = url.read() 
soup = BeautifulSoup(content, 'lxml') 

table = soup.findAll('div',attrs={"class":"content-question"}) 
for x in table: 
    print x.find('p').text 

# Another way to retrieve tables: 
# table = soup.select('div[class="content-question"]')

tableの最初p要素の印刷されたテキストれる次

[...]ダイVerhandlungenユーバーダイmöglicheVisabefreiungエリーゼtürkischeStaatsbürgerあたりエンデJuniシンドNOCH NICHT abgeschlossen、sodass NICHT MIT Sicherheit gesagt werdenカン、OB ES ZU diesem Zeitpunktがbereits ZU einer Visabefreiu ng kommt。おおよそのごちそうは、あなたのお仕事のために必要です。 Prinzipiell is so jedoch so、dass Visaerleichterungen bzw. Reziprozitätsind、d.hの授権の授権Staaten geltenmüsstenを飲むのに適しています。 [...]

出典

2016-04-19 22:16:16 Phillip

BeautifulSoupで `div`内の` p`からテキストを抽出します

答えて

関連する問題