2016-03-29 9 views
1

私は、Yahoo FinanceのWebページからRを使ってニュースを削って、日付とニュースの見出しという2つの列を持つテーブルを作成しようとしています。 指示に従いますhereニュース見出しのある列を正しく作成します。次のステップは、日付を取得してテーブルに列として追加することです。Yahooファイナンスのヘッドラインと日付を掻き立てるR

私はちょうどこのコマンドを変更する必要があると思います:

out_dt <- xpathSApply(d, "//ul[contains(@class,'newsheadlines')]/following::ul/li/a", xmlValue) 

から日付の代わりに、見出しを取得するためには、一例として、このコード:

<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><title>BMPS.MI Headlines | BANCA MPS Stock - Yahoo! Finance</title><script type="text/javascript" src="http://l.yimg.com/a/i/us/fi/03rd/yg_csstare_nobgcolor.js"></script><link rel="stylesheet" href="http://l.yimg.com/zz/combo?kx/yucs/uh3/uh/1138/css/uh_non_mail-min.css&amp;kx/yucs/uh3s/atomic/84/css/atomic-min.css&amp;kx/yucs/uh_common/meta/3/css/meta-min.css&amp;kx/yucs/uh3/top-bar/366/css/no_icons-min.css&amp;kx/yucs/uh3/search/css/588/blue_border-min.css&amp;kx/yucs/uh3/get-the-app/151/css/get_the_app-min.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_yoda_legacy_lego_concat.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_symbol_suggest.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yui_helper.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_theme_teal.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_follow_quote.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_follow_stencil.css" type="text/css"><script language="javascript"> 
ll_js = new Array(); 
</script><script type="text/javascript" src="http://l1.yimg.com/bm/combo?fi/common/p/d/static/js/2.0.356981/2.0.0/mini/yui-min-3.9.1.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/yuiloader-dom-event/2.0.0/mini/yuiloader-dom-event.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/container/2.0.0/mini/container.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/datasource/2.0.0/mini/datasource.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/autocomplete/2.0.0/mini/autocomplete.js"></script><script language="javascript"> 
YUI.YUICfg = {"base":"http:\/\/l.yimg.com\/","comboBase":"http:\/\/l.yimg.com\/zz\/combo?","combine":true,"allowRollup":true,"maxURLLength":"2000"} 
YUI.YUICfg.root = 'yui:'+YUI.version+'/build/'; 
YUI.applyConfig(YUI.YUICfg); 
</script><script language="javascript"> 
ll_js.push({ 
    'success_callback' : function() { 
      YUI().use('stencil', 'follow-quote', 'node', function (Y) { 
       var conf = {'xhrBase': '/', 'lang': 'en-US', 'region': 'US', 'loginUrl': 'https://login.yahoo.com/config/login_verify2?&.done=http://finance.yahoo.com/q?s=BMPS.MI&.intl=us'}; 

       Y.Media.FollowQuote.init(conf, function() { 
        var exchNode = null, 
         followSecClass = "", 
         followHtml = "", 
         followNode = null; 

        followSecClass = Y.Media.FollowQuote.getFollowSectionClass(); 
        followHtml = Y.Media.FollowQuote.getFollowBtnHTML({ ticker: 'BMPS.MI', addl_classes: "follow-quote-always-visible", showFollowText: true }); 
        followNode = Y.Node.create(followHtml); 
        exchNode = Y.one(".wl_sign"); 
        if (!Y.Lang.isNull(exchNode)) { 
         exchNode.append(followNode); 

        } 

       }); 
      }); 
    } 
}); 

任意の提案?次のように

答えて

3

あなたはrvestを使用することができます。これはあなたを与える

require(rvest) 
doc <- read_html("http://finance.yahoo.com/q/h?s=AAPL+Headlines") 
scope <- doc %>% html_nodes("#yfncsumtab li") 
res <- lapply(scope, function(li){ 
    data.frame(stringsAsFactors = FALSE, 
    date = li %>% html_node("cite span") %>% html_text, 
    headline = li %>% html_node("a") %>% html_text 
    ) 
}) 
do.call(rbind, res) 

   date                     headline 
1 (Tue 3:49AM EDT)         US hacks iPhone, ends legal battle but questions linger 
2 (Tue 1:27AM EDT)       Amazon Echo turns into a sleeper hit, offsetting Fire's failure 
3 (Tue 1:00AM EDT)          Why Everyone Loses in Apple’s Fight Against the FBI 
4 (Tue 12:36AM EDT) [$$] US drops Apple case, Japan's negative rate bounty and the criminals paid not to kill 
5 (Tue 12:25AM EDT)        U.S. succeeds in cracking Apple's iPhone, drops legal action 
6 (Tue 12:00AM EDT) [$$] Brussels Attacks: Belgium Turns to U.S. for Help in Scouring Seized Laptops, Phones 
7  (Mon, Mar 28)    [$$] FBI Opens San Bernardino Shooter’s iPhone; U.S. Drops Demand on Apple 
8  (Mon, Mar 28)            Wolverton: Encyption debate isn't going away 
9  (Mon, Mar 28)           [$$] US drops Apple case after cracking iPhone 
10  (Mon, Mar 28)   Words of warning — not celebration — in Silicon Valley after FBI ends Apple fight 
11  (Mon, Mar 28)        [$$] FBI Opens Shooter's iPhone; U.S. Drops Demand on Apple 
12  (Mon, Mar 28)           FBI hacks into terrorist’s iPhone without Apple 
13  (Mon, Mar 28)         Justice Department cracks iPhone; withdraws legal action 
14  (Mon, Mar 28)        Apple responds: 'This case should have never been brought' 
15  (Mon, Mar 28)       IPhone Security Is the Casualty in Apple's Victory Over the FBI 
16  (Mon, Mar 28)       Cracked Apple iPhone By F.B.I. Puts Spotlight On Apple Security 
17  (Mon, Mar 28)         DOJ Drops Apple Case: Bloomberg West (Full Show 03/28) 
18  (Mon, Mar 28)           Apple, Inc.'s New iPhone SE: Off to a Big Start? 
19  (Mon, Mar 28)            AP Explains: Apple vs. FBI _ What Happened? 
20  (Mon, Mar 28)             PRESS DIGEST- Financial Times - March 29 

を私はあなたに日付の解析を残してください。第2のケースで

 [,1]      [,2]                      
[1,] "Tuesday, March 29, 2016" "March 29 Premarket Briefing: 10 Things You Should Know"         
[2,] "Tuesday, March 29, 2016" "You might soon be able to pay for goods in-store using Facebook Messenger"     
[3,] "Tuesday, March 29, 2016" "FBI unlocks iPhone"                  
[4,] "Tuesday, March 29, 2016" "US hacks iPhone, ends legal battle but questions linger"         
[5,] "Tuesday, March 29, 2016" "Amazon Echo turns into a sleeper hit, offsetting Fire's failure"       
[6,] "Tuesday, March 29, 2016" "Why Everyone Loses in Apple’s Fight Against the FBI"          
[7,] "Tuesday, March 29, 2016" "[$$] US drops Apple case, Japan's negative rate bounty and the criminals paid not to kill" 
[8,] "Tuesday, March 29, 2016" "U.S. succeeds in cracking Apple's iPhone, drops legal action"        
[9,] "Tuesday, March 29, 2016" "[$$] Brussels Attacks: Belgium Turns to U.S. for Help in Scouring Seized Laptops, Phones" 
[10,] "Monday, March 28, 2016" "[$$] FBI Opens San Bernardino Shooter’s iPhone; U.S. Drops Demand on Apple"    
[11,] "Monday, March 28, 2016" "Wolverton: Encyption debate isn't going away"            
[12,] "Monday, March 28, 2016" "[$$] US drops Apple case after cracking iPhone"           
[13,] "Monday, March 28, 2016" "Words of warning — not celebration — in Silicon Valley after FBI ends Apple fight"   
[14,] "Monday, March 28, 2016" "[$$] FBI Opens Shooter's iPhone; U.S. Drops Demand on Apple"        
[15,] "Monday, March 28, 2016" "FBI hacks into terrorist’s iPhone without Apple"           
[16,] "Monday, March 28, 2016" "Justice Department cracks iPhone; withdraws legal action"         
[17,] "Monday, March 28, 2016" "Apple responds: 'This case should have never been brought'"        
[18,] "Monday, March 28, 2016" "IPhone Security Is the Casualty in Apple's Victory Over the FBI"       
[19,] "Monday, March 28, 2016" "Cracked Apple iPhone By F.B.I. Puts Spotlight On Apple Security"       
[20,] "Monday, March 28, 2016" "DOJ Drops Apple Case: Bloomberg West (Full Show 03/28)" 

また次の行列になり

require(rvest) 
doc <- read_html("http://finance.yahoo.com/q/h?s=AAPL+Headlines") 
scope <- doc %>% html_nodes("#yfncsumtab") 
dates <- scope %>% html_nodes("h3 span") %>% html_text() 
headlines <- scope %>% html_nodes("h3 + ul") %>% lapply(. %>% html_nodes("li a") %>% html_text) 

# combine both 
do.call(rbind,Map(cbind, dates, headlines)) 

を次のように

別の方法は、私が日付の解析ままH3-見出しから日付を取ることになりますあなた

関連する問題