ニュース記事からh2およびh3タイトルを抽出する方法

-2

ニュース記事からメインタイトルを抽出できるこのウェブスクレイパーを作成しようとしています。ニュース記事からh2およびh3タイトルを抽出する方法

# -*- coding: utf-8 -*- 
import requests 
from bs4 import BeautifulSoup 

url= input('enter the url \n') 

r = requests.get(url) 
content = r.content 
soup = BeautifulSoup(content, "html.parser") 
heading = soup.find_all('h1') 
print(heading) 
print(str.strip(heading[0].text))

これは、H1タグ内のタイトルのためにのみ動作しますが、H2やH3タグにタイトルのエラーを投げます。このコードをh2タグとh3タグでも使用できるように変更するにはどうすればよいですか？前もって感謝します！

soup.find_all(['h1', 'h2', 'h3'])

あなたも行うことができます：

出典

2016-06-23 Amit Singh

BeautifulSoupはちょうどあなたが見つけたいlist of tag namesを渡し、非常に柔軟である

import re 

soup.find_all(re.compile(r"^h\d$")) # would match "h" followed by a single digit

出典

2016-06-23 19:58:58 alecxe

おかげで働い助けアレックスのために多くのことを、私がいましたh1タグとh2タグを抽出することはできますが、[this]（http://android-developers.blogspot.in/2016/06/introducing-android-basics-nanodegree.html）などの記事からメインタイトルを抽出するにはメインタイトルはh3タグと日付のh2です。 –

@AmitSinghよく、あなたはクラス名で日付を見つけることができます： 'soup.find（class _ =" date-header "）。get_text（）'記事タイトルも同じです： 'soup.find（class _ =" post -title "）。get_text（）'。 – alecxe

ニュース記事からh2およびh3タイトルを抽出する方法

答えて

関連する問題