R - Web Page Scraping - rvestを使用して属性値を取得するときに問題が発生しました

rvestを使用してwikipedia（他のページのリンクを含む）からISOの国情報を取得しようとしています。私は正しく名前（私はエラーを引き起こすxpath文字列関数を試してみました）を含めずにリンク（href属性）を正しく取得する方法を見つけることができません。実行するのはかなり簡単で、自明です。R - Web Page Scraping - rvestを使用して属性値を取得するときに問題が発生しました

library(rvest) 
library(dplyr) 

searchPage <- read_html("https://en.wikipedia.org/wiki/ISO_3166-2") 
nodes <- html_node(searchPage, xpath = '(//h2[(span/@id = "Current_codes")]/following-sibling::table)[1]') 
codes <- html_nodes(nodes, xpath = 'tr/td[1]/a/text()') 
names <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/text()') 
#Following brings back data but attribute name as well 
links <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/@href') 
#Following returns nothing 
links2 <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/@href/text()') 
#Following Errors 
links3 <- html_nodes(nodes, xpath = 'string(tr/td[2]//a[@title]/@href)') 
#Following Errors 
links4 <- sapply(nodes, function(x) { x %>% read_html() %>% html_nodes("tr/td[2]//a[@title]") %>% html_attr("href") })

出典

2017-10-21 Martin Thompson

あなたの質問にはさらに詳しい情報が含まれているはずです。「わかりやすい」ビットは、ほとんど私に質問を無視させました（ヒント：他人の時間と壊れたコードの尊重から十分な言葉の詳細を提供することを検討してください）。

私はそれがあなたが本当に必要としなかったかどうかわからない、と言っています。

library(rvest) 
library(tibble) 

pg <- read_html("https://en.wikipedia.org/wiki/ISO_3166-2") 

tab <- html_node(pg, xpath=".//table[contains(., 'Zimbabwe')]") 

iso_col <- html_nodes(tab, xpath=".//td[1]/a[contains(@href, 'ISO')]") 
name_col <- html_nodes(tab, xpath=".//td[2]") 

data_frame(
    iso2c = html_text(iso_col), 
    iso2c_link = html_attr(iso_col, "href"), 
    country_name = html_text(name_col), 
    country_link = html_nodes(name_col, xpath=".//a[contains(@href, 'wiki')]") %>% html_attr("href") 
) 
## # A tibble: 249 x 4 
## iso2c   iso2c_link   country_name    country_link 
## <chr>    <chr>    <chr>      <chr> 
## 1 AD /wiki/ISO_3166-2:AD    Andorra    /wiki/Andorra 
## 2 AE /wiki/ISO_3166-2:AE United Arab Emirates /wiki/United_Arab_Emirates 
## 3 AF /wiki/ISO_3166-2:AF   Afghanistan   /wiki/Afghanistan 
## 4 AG /wiki/ISO_3166-2:AG Antigua and Barbuda /wiki/Antigua_and_Barbuda 
## 5 AI /wiki/ISO_3166-2:AI    Anguilla    /wiki/Anguilla 
## 6 AL /wiki/ISO_3166-2:AL    Albania    /wiki/Albania 
## 7 AM /wiki/ISO_3166-2:AM    Armenia    /wiki/Armenia 
## 8 AO /wiki/ISO_3166-2:AO    Angola    /wiki/Angola 
## 9 AQ /wiki/ISO_3166-2:AQ   Antarctica   /wiki/Antarctica 
## 10 AR /wiki/ISO_3166-2:AR   Argentina   /wiki/Argentina 
## # ... with 239 more rows

出典

2017-10-21 11:50:21 hrbrmstr

ありがとうございます！申し訳ありませんが、私はコメントが十分に良いと思った、将来的にはより多くの情報を入れようとします！ –

R - Web Page Scraping - rvestを使用して属性値を取得するときに問題が発生しました

答えて

関連する問題