Pythonを掻き集めるhref iinks

私の目標は、base_urlサイトのhrefリンクを掻き集めることです。Pythonを掻き集めるhref iinks

マイコード：

from bs4 import BeautifulSoup 
from selenium import webdriver 
import requests, csv, re 

game_links = [] 
link_pages = [] 
base_url = "http://www.basket.fi/sarjat/ohjelma_tulokset/?season_id=93783&league_id=4#mbt:2-303$f&stage=177155:$p&0=" 


browser = webdriver.PhantomJS() 
browser.get(base_url) 
table = BeautifulSoup(browser.page_source, 'lxml') 
for game in table.find_all("a", {'game_id': re.compile('\d+')}): 
    href=game.get("href") 
    print(href)

結果：

http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4 
http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4 
http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4 
http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4 

......

問題が結果にHREFリンクは常に2回来る理由を私は理解できないということでしょうか？修正

出典

2017-08-22 Juho M

リンクがページに2回表示されることがあり？ダブルセットをフィルタリングするために 'set（）'を使うことができます（humm、タグオブジェクトを使って作業しているかどうかは分かりません...） – PRMoureu

As you Notice in the image there are same game_id for two links

コード： This would help you to get only one link

for game in table.find_all("a", {'game_id': re.compile('\d+')}): 
    if game.children: 
     href=game.get("href") 
     print(href)

出典

2017-08-22 11:38:48

Pythonを掻き集めるhref iinks

答えて

関連する問題