beatifulsoupを使用してタグ間のコンテンツを取得

<h2>と</h2>の間のすべてのコンテンツを取得しようとしています。このような：beatifulsoupを使用してタグ間のコンテンツを取得

<h2> Header 1 </h2> 
This is an example text for <a href="https://example.com">site</a> 
Any HTML-Code can appear 
<br /> 
<p> 
<h2> Header 2 </h2> 
Some other text with no tags 
<h2> Header 3 </h2>

結果は次のようになります。

This is an example text for <a href="https://example.com">site</a> 
Any HTML-Code can appear 
<br> 
<p>

そして：

Some other text with no tags

誰もが正しい方向に私をプッシュすることができますか？

出典

2017-03-02 houdini2

あなたは全体のテキストを取得し、H2-タグを削除するには（）を分解を使用してもらえますか？ – RoundFour

これまでに何か試しましたか？ – Nobita

あなたが下に述べたことを考えれば、あなたの質問は不明確です。 2つのタグの間にあるテキストだけを探しているのか、「」タグと「
」タグをテキストと一緒に保存したいのか分かりません – DMPierre

おかげで、それは私が必要とするものexacltyではありません。私は情報を少なくするように言ったかもしれません。

あり、コンテンツの多くは、以前にこのテキストの後で、私は私が（分解を使用する場合</h2>と<h2>

間のテキストをgrep検索したい）、それが唯一のH2-タグを削除しますが、他のすべてのものはまだありますそこ。私の問題は、その1のようになります。Extracting text without tags of HTML with Beautifulsoup Python

私は可能な解決策を見つけた：

content = soup.find_all("div",class_="class") 
begin = str(content).find("Header 1</h2>") 
end = str(content).find("<h2>Header 2") 
print(str(content)[begin:end])

出典

2017-03-02 14:20:40 houdini2

私は分解するつもりです。

while soup.find("h2") != None: # the find method returns the found element 
    soup.h2.decompose() 

>>> \nThis is an example text for <a href="https://example.com">site</a>\nAny HTML-Code can appear \n<br>\n<p>\n\nSome other text with no tags\n</p></br>

以上の微妙：先端のための

soup.h2.decompose() 
second_text = soup.h2.next_sibling 
while soup.find("h2") != None: 
    soup.h2.decompose() 

print soup, second_text 


>>> This is an example text for <a href="https://example.com">site</a> 
    Any HTML-Code can appear 
    <br> 
    <p> 

    Some other text with no tags 
    </p></br> 
    Some other text with no tags

出典

2017-03-02 10:48:16 DMPierre

beatifulsoupを使用してタグ間のコンテンツを取得

答えて

関連する問題