Pythonと美味しいスープを使ったWebスクレイピング

私は建築用ウェブスクレーパーを練習しています。私が今作業しているのは、サイトに行き、そのサイトのさまざまな都市のリンクを掻き集め、各都市のリンクをすべて取って、そのサイトのすべてのリンクを掻き集めることです。私は私がしたいHTMLを取得city_tagsをプリントアウトした場合Pythonと美味しいスープを使ったWebスクレイピング

import requests 

from bs4 import BeautifulSoup 

main_url = "http://www.chapter-living.com/" 

# Getting individual cities url 
re = requests.get(main_url) 
soup = BeautifulSoup(re.text, "html.parser") 
city_tags = soup.find_all('a', class_="nav-title") # Bottom page not loaded dynamycally 
cities_links = [main_url + tag["href"] for tag in city_tags.find_all("a")] # Links to cities

：

私は、次のコードを使用しています。しかし、cities_linksを印刷するとAttributeError: 'ResultSet' object has no attribute 'find_all'が得られます。

このエラーは、city_tagsが返されないため、ここでは他のqから集められますが、希望のhtmlを印刷している場合はこれができません。私はhtmlが[]にあることに気付きました - これは違いをもたらしますか？

出典

2017-03-16 Maverick

エラーが言うように、city_tagsは、ノードのリストでのResultSetであり、それはfind_allメソッドを持っていない、あなたがセットをループしており、個々のノードまたはあなたにfind_allを適用するのいずれか

[tag['href'] for tag in city_tags] 

#['https://www.chapter-living.com/blog/', 
# 'https://www.chapter-living.com/testimonials/', 
# 'https://www.chapter-living.com/events/']

出典

2017-03-16 17:50:47 Psidom

まあcity_tagsは、タグのbs4.element.ResultSet（本質的リスト）であり、あなたはそれにfind_all呼びかけている：場合は、私はあなたが単純に各ノードからhref属性を抽出することができると思います。結果セットのすべての要素でfind_allを呼び出すか、href属性を取得するだけです。

import requests 
from bs4 import BeautifulSoup 

main_url = "http://www.chapter-living.com/" 

# Getting individual cities url 
re = requests.get(main_url) 
soup = BeautifulSoup(re.text, "html.parser") 
city_tags = soup.find_all('a', class_="nav-title") # Bottom page not loaded dynamycally 
cities_links = [main_url + tag["href"] for tag in city_tags] # Links to cities

出典

2017-03-16 17:50:50

Pythonと美味しいスープを使ったWebスクレイピング

答えて

関連する問題