beautifulsoupでコメントを抽出するには？

私はPythonとデータマイニングにはまったく新しいので、出力から部品を抽出する質問があります。私は3.6でPythonを使用しており、今日は朝にすべてのものを更新しました。私は出力を匿名化し、パスワード、トークンなどを含むすべての行を削除しました。beautifulsoupでコメントを抽出するには？

from bs4 import BeautifulSoup 

soup = BeautifulSoup(open("facebookoutput.html"), "html.parser") 

comments = soup.findAll('div', class_="_2b06") 

print(comments[0]) # show print of first entry: 

<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&amp;rc=p& amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like stackoverflow. </div></div>

「いいコメントがあります。私はスタックオーバーフローが好きです。

ありがとうございます。

出典

2017-12-20 smurfit89

'コメント[0] .div.find_all（ 'DIV'）[ - 1] .text' ...多分？ –

私はこれを試しましたが、 'IndexError：list index of range'を返しました。 – smurfit89

これを試してみてください：

from bs4 import BeautifulSoup 

content=""" 
<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&amp;rc=p& amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like stackoverflow. </div></div> 
""" 

soup = BeautifulSoup(content, "html.parser") 
comments = ' '.join([item.text for item in soup.select("[data-sigil='comment-body']")]) 
print(comments)

出力：

There is nice comment. I like stackoverflow.

出典

2017-12-20 15:49:37 SIM

beautifulsoupでコメントを抽出するには？

答えて

関連する問題