beautifulsoupが特定のクラスのすべてのpを見つけられません

私はbeautifulsoupを使用して、ローカルに保存された特定のhtmlページのすべてのpを検索しています。私のコードはbeautifulsoupが特定のクラスのすべてのpを見つけられません

with open ("./" + str(filename) + ".txt", "r") as myfile: 
    data=myfile.read().replace('\n', '') 
soup = BeautifulSoup(data) 
t11 = soup.findAll("p", {"class": "commentsParagraph"})

このコードは、ページの一部の作品ですが、ページの一部は（私はソースを保存する前に、私はプリロードする）AJAXでロードされ、コードがそれに取り組んでいません。これをテストする

私はAjaxの部分クラスcommentsParagraph2でpタグのいずれかに加え、

t11 = soup.findAll("p", {"class": "commentsParagraph2"})

に私のコードを変更するが、T11は空のリストです。

私もhere

任意のアイデアをページファイルを添付していますか？

出典

2016-10-10 Quantico

あなたが事前ロードしているAjaxレスポンスは、beautifulsoupがDOMの一部として見ていないと思います。 –

BS4は、問題はすべての3つのパーサーを使用しなくても見つけることができますあなたのhtmlでcommentsParagraph2クラスで1個のpタグがあります：

In [8]: from bs4 import BeautifulSoup 
    ...: soup1 = BeautifulSoup(open("/home/padraic 
    ...: /t.html").read(),"html5lib") 
    ...: soup2 = BeautifulSoup(open("/home/padraic 
    ...: /t.html"),"html.parser") 
    ...: soup3 = BeautifulSoup(open("/home/padraic 
    ...: /t.html"),"lxml") 
    ...: print(soup1.select_one("p.commentsParagraph2")) 
    ...: print(soup2.select_one("p.commentsParagraph2")) 
    ...: print(soup3.select_one("p.commentsParagraph2")) 
    ...: 
<p class="commentsParagraph2"> 
So much better than Ryder. Only take Econ 11 if she's one of the professors teaching it. Beware her tests though, which are much different from Ryder's. 
</p> 
<p class="commentsParagraph2"> 
So much better than Ryder. Only take Econ 11 if she's one of the professors teaching it. Beware her tests though, which are much different from Ryder's. 
</p> 
<p class="commentsParagraph2"> 
So much better than Ryder. Only take Econ 11 if she's one of the professors teaching it. Beware her tests though, which are much different from Ryder's. 
</p>

だからどちらかが壊れたともはや維持BeautifulSoup3またはBS4の古いバージョンを使用しています。

出典

2016-10-10 09:23:58

私はbs4からインポートしていますBeautifulSoup どうすればバージョンを確認できますか？ – Quantico

'import bs4; bs4 .__ version__' –

あなたはどのバージョンをお使いですか？ – Quantico

-1

私はあなたにhtmlをダウンロードし、いくつかのテストを行った、beautifulsoupモジュールは3つのpノードしか見つけることができませんでした。そして、おそらく、htmlにはiframeがいくつか存在するため、BSはおそらく動作しません。私の提案はあなたの参照のための代わりのbs

サンプルコードreモジュールを使用している：

import re 

with open('1.html', 'r') as f: 
    data = f.read() 
    m=re.findall(r'(?<=<p class="commentsParagraph">)[\!\w\s.\'\,\-\(\)\@\#\$\%\^\&\*\+\=\/|\^<]+(?=</p>)', data) 
    print(m)

出典

2016-10-10 01:32:20 Enix

beautifulsoupが特定のクラスのすべてのpを見つけられません

答えて

関連する問題