BeautifulSoup特定のタグテキストを抽出しない

BeautifulSoupを使用して特定のタグの情報を収穫する際に問題が発生しました。タグhtmlの間に 'Item 4'のテキストを抽出したいと思いますが、以下のコードは 'Item 1'に関連するテキストを取得します。私が間違っていること（例えば、スライス）は何ですか？BeautifulSoup特定のタグテキストを抽出しない

コード：

primary_detail = page_section.findAll('div', {'class': 'detail-item'}) 
for item_4 in page_section.find('h3', string='Item 4'): 
    if item_4: 
    for item_4_content in page_section.find('html'): 
     print (item_4_content)

HTML：

<div class="detail-item"> 
    <h3>Item 1</h3> 
    <html><body><p>Item 1 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 2</h3> 
    <html><body><p>Item 2 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 3</h3> 
    <html><body><p>Item 3 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 4</h3> 
    <html><body><p>Item 4 text here</p></body></html> 
</div>

出典

2017-04-24 Life is complex

正しい<h3>テキスト値に応じ<p>タグの内容を印刷したいように見えますか？

あなたのコードを次の条件を満たす必要があります。

負荷<h3>タグの.text値が等しい場合、各出現について'detail-item'
に等しい'class'が含まれているすべての'div'タグのhtml_source
検索文字列'Item 4'
次にコードprint.textの対応するng <p>タグ

次のコードを使用して、目的を達成できます。

コード：

s = '''<div class="detail-item"> 
    <h3>Item 1</h3> 
    <html><body><p>Item 1 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 2</h3> 
    <html><body><p>Item 2 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 3</h3> 
    <html><body><p>Item 3 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 4</h3> 
    <html><body><p>Item 4 text here</p></body></html> 
</div>''' 

from bs4 import BeautifulSoup 

soup = BeautifulSoup(s, 'lxml') 

primary_detail = soup.find_all('div', {'class': 'detail-item'}) 

for tag in primary_detail: 
    if 'Item 4' in tag.h3.text: 
     print(tag.p.text)

出力：

'Item 4 text here'

EDIT：provided website第1のループ出現では、<h3>タグを持っていないだけ<h2>任意の.text値、正しいですか？

あなたは、次のコードのように、try/except句を使用して、このエラーを回避することができます。..

コード：

from bs4 import BeautifulSoup 
import requests 


url = 'https://fortiguard.com/psirt/FG-IR-17-097' 
html_source = requests.get(url).text 

soup = BeautifulSoup(html_source, 'lxml') 

primary_detail = soup.find_all('div', {'class': 'detail-item'}) 

for tag in primary_detail: 
    try: 
     if 'Solutions' in tag.h3.text: 
      print(tag.p.text) 
    except: 
     continue

コードが例外に直面している場合、それはとの繰り返しを続けますループ内の次の要素したがって、コードは.text値を含まない最初の項目を無視します。

出力：

'Upgrade to FortiWLC-SD version 8.3.0'

出典

2017-04-24 16:40:10

私はこのエラーを受け取っ：はAttributeError： 'NoneType' オブジェクトは、これにリンクされているいかなる属性 'テキスト' がありません - tag.h3.textを。 –

どのようにhtml_sourceを読み込みましたか？私の例ではあなたが提供したソースを使っていましたが、本当の問題では 's = requests.get（url）のようなものを使うことができます。テキストは、HTMLソースをロードする –

はい、私は実際のページを削っています。 div class = "detail-item"タグ内のh2タグからテキストを抽出できますが、h3タグの下のテキストは抽出できません。ここでは、ページの内容を取得するために使用しているラインがあります - itemSoupParser = BeautifulSoup（raw_html、 'html.parser'）。 h3のテキストコンテンツ以外のすべてのページを取得できます。 –

BeautifulSoup特定のタグテキストを抽出しない

答えて

関連する問題