Google Scholarのh-index、i10-index、およびtotal citationsを掻き集める

Google Scholarのデータをスクラップするプロジェクトに取り組んでいます。私は著者のh-索引、総引用数、i-10索引（すべて）を掻き集めたい。 Louisa Gilbertからたとえば、私はこすりしたい：Google Scholarのh-index、i10-index、およびtotal citationsを掻き集める

h-index = 36 
i10-index = 74 
citations = 4383

私はこの書かれている：

from bs4 import BeautifulSoup 
import urllib.request 
url="https://scholar.google.ca/citations?user=OdQKi7wAAAAJ&hl=en" 
page = urllib.request.urlopen(url) 
soup = BeautifulSoup(page, 'html.parser')

を私は継続するかどうかはわからないと思います。（私はいくつかの図書館があると理解していますが、h-インデックスとi10-インデックスを掻き分けることはできません）

出典

2016-12-25 user7340115

あなたはほとんどあります。抽出するデータを含むHTML要素を見つける必要があります。この特定の場合、インデックスはタグ<td class="gsc_rsb_std">に含まれています。スープ要素からこれらのタグを取得し、次にstringメソッドを使用して、タグ内からテキストを復元する必要があります。

indexes = soup.find_all("td", "gsc_rsb_std") 
h_index = indexes[2].string 
i10_index = indexes[4].string 
citations = indexes[0].string

出典

2016-12-27 15:10:31

Google Scholarのh-index、i10-index、およびtotal citationsを掻き集める

答えて

関連する問題