データを抽出するためのPythonを使ったWebScraping

私は次のコードを使用しています。「所属」の部分を除くすべてが機能します。はAttributeError：それはエラーが返されます「NoneType」オブジェクトは、.textセクションがなければ何の属性「テキスト」を持っていない、それはすべてを返します - 全体のコードをクラスの中にはなかったため、この例外がトリガされたデータを抽出するためのPythonを使ったWebScraping

import requests 
import bs4 
import re 

headers = {'User-Agent':'Mozilla/5.0'} 

url = 'http://pubs.acs.org/toc/jacsat/139/5' 
html = requests.get(url, headers=headers) 

soup = bs4.BeautifulSoup(html.text, 'lxml') 

tags = soup.findAll('a', href=re.compile("full")) 

for tag in tags: 
    new_url = tag.get('href', None) 
    newurl = 'http://pubs.acs.org' + new_url 
    newhtml = requests.get(newurl, headers=headers) 
    newsoup = bs4.BeautifulSoup(newhtml.text, 'lxml') 

    article_title = newsoup.find(class_="articleTitle").text 
    print(article_title) 

    affiliations = newsoup.find(class_="affiliations").text 
    print(affiliations) 

    authors = newsoup.find(id="authors").text 
    print(authors) 

    citation_year = newsoup.find(class_="citation_year").text 
    print(citation_year) 

    citation_volume = newsoup.find(class_="citation_volume").text 
    print(citation_volume) 

    citation = newsoup.find(id="citation").text 
    print(citation) 

    pubdate = newsoup.find(id="pubDate").text 
    print(pubdate)

出典

2017-02-10 wus

クラス「所属」を持つ要素を見つける。スクリプトが最初に拾ったURLのソースHTML（またはそのような他の属性）で、このクラス値を持つ要素をチェックしても見つかりませんでした。

要素を見つけられなかった場合は、スクリプトでブレークして[なし]またはデフォルトの文字列を返さないように、エラーをキャッチします。

ような何かが働くだろう：

try: 
    affiliations = newsoup.find(class_="affiliations").text 
    print(affiliations) 
except AttributeError: 
    affiliations = None

出典

2017-02-12 18:23:30

データを抽出するためのPythonを使ったWebScraping

答えて

関連する問題