{xml_nodeset（0）}を与えるhtml_nodes

私はwww.speedtest.net/awards/ca/ontarioからデータをスクラップしようとしていますが、標準の関数が動作するように見えるいくつかのパスを調べると、。なぜ私は分からない。例えば{xml_nodeset（0）}を与えるhtml_nodes

Iは、ヘッダに移動し、スクリプトを探している場合、それは予想通りこれは、{xml_nodeset（1）}返され

library(rvest) 
URL<-read_html("http://www.speedtest.net/awards/ca/ontario") 
test1<-html_nodes(URL,xpath='/html/head/script[1]') 
test1

作品。

しかし、私は体内に入ると、私が手

test2<-html_nodes(URL,xpath='/html/body/script[1]') 
test2

似た何かをしようとした場合、{xml_nodeset（0）}。

身体の下にあるノードに到達できないのはなぜですか？

私は以下のコードを使用しようとしていますが、私は上記の問題に戻って問題を追跡しました。

real<-html_nodes(URL,xpath='/html/body/div[1]/div[3]/div/div[2]/div/div[3]/div[2]/table') 
real

私は、XPathとは対照的に、それが簡単にCSSタグを検索するために見つけるrvestで

library(rvest) 
URL<-read_html("http://www.speedtest.net/awards/ca/ontario") 
#find the table rows in the page 
table<-html_nodes(URL, "tbody tr") 

#pull info from the table rows 
num<-html_text(html_nodes(table, "td.u-align-right")) 
provider<-html_text(html_nodes(table, "td.cell-provider-name")) 

#final data.frame with a table of the results 
df<-data.frame(provider, matrix(num, ncol=3, byrow=TRUE))

：

出典

2016-06-23 ColinTea

'body'の下に' script'タグがあると思いますか？ – splash58

はい。私はまたそこにあるはずのdivタグを見つけようとしましたが、同じ{xml_nodeset（0）}を返します。 – ColinTea

は、完全ではないかもしれないが、それはあなたの質問に答えるのヘッドスタートを提供する必要があり、これを試してみてください。

出典

2016-06-23 21:47:32 Dave2e

ありがとうございました。 CSSタグ検索を使用して、私は望んだテーブル（右下にあるテーブル）を得るのにうまく機能するこれを思いついた。

library(rvest) 
URL<-read_html("http://www.speedtest.net/awards/ca/ontario") 
table<-html_nodes(URL, "table") 
table<-html_table(table)[[2]]

出典

2016-06-24 14:04:51 ColinTea

{xml_nodeset（0）}を与えるhtml_nodes

答えて

関連する問題