最近、YahooがAPIを変更したと思います。あなたは簡単にR.
にロードすることができますExcel用です
http://investexcel.net/multiple-stock-quote-downloader-for-excel/
を、「グーグル・ファイナンスからバルクヒストリカルデータをダウンロードするために、Excelスプレッドシートを取得します」というタイトルのリンクからファイルをダウンロード
このようなものを試すこともできます。
# assumes codes are known beforehand
codes <- c("MSFT","SBUX","S","AAPL","ADT")
urls <- paste0("https://www.google.com/finance/historical?q=",codes,"&output=csv")
paths <- paste0(codes,"csv")
missing <- !(paths %in% dir(".", full.name = TRUE))
missing
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
tryCatch(
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
message(c$message)
}
)
}
# wrapper of mapply
Map(downloadFile, urls[missing], paths[missing])
またはこれです。
## downloads historic prices for all constituents of SP500
library(zoo)
library(tseries)
## read in list of constituents, with company name in first column and
## ticker symbol in second column
## CREATE A FILE TO READ DATA FROM!!!
spComp <- read.csv("C:/Users/Excel/Desktop/stocks.csv")
## specify time period
dateStart <- "2013-01-01"
dateEnd <- "2015-05-08"
## extract symbols and number of iterations
symbols <- spComp[, 1]
nAss <- length(symbols)
## download data on first stock as zoo object
z <- get.hist.quote(instrument = symbols[1], start = dateStart,
end = dateEnd, quote = "AdjClose",
retclass = "zoo", quiet = T)
## use ticker symbol as column name
dimnames(z)[[2]] <- as.character(symbols[1])
## download remaining assets in for loop
for (i in 2:nAss) {
## display progress by showing the current iteration step
cat("Downloading ", i, " out of ", nAss , "\n")
result <- try(x <- get.hist.quote(instrument = symbols[i],
start = dateStart,
end = dateEnd, quote = "AdjClose",
retclass = "zoo", quiet = T))
if(class(result) == "try-error") {
next
}
else {
dimnames(x)[[2]] <- as.character(symbols[i])
## merge with already downloaded data to get assets on same dates
z <- merge(z, x)
}
}
## save data
# CREATE A FILE TO WRITE DATA TO!!!
write.zoo(z, file = "C:/Users/Excel/Desktop/all_sp500_price_data.csv", index.name = "time")
ここでは、もう1つのオプションについて検討します。
Method #1:
---
layout: post
title: "2014-11-20-Download-Stock-Data-1"
description: ""
category: R
tags: [knitr,lubridate,stringr,plyr,dplyr]
---
{% include JB/setup %}
This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame. This script is slightly modified from a script which downloads RStudio package download log data. The original source can be found [here](https://github.com/hadley/cran-logs-dplyr/blob/master/1-download.r).
First of all, the following three packages are used.
{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(plyr)
library(dplyr)
{% endhighlight %}
The script begins with creating a folder to save data files.
{% highlight r %}
# create data folder
dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1")
if(file.exists(dataDir)) {
unlink(dataDir, recursive = TRUE)
dir.create(dataDir)
} else {
dir.create(dataDir)
}
{% endhighlight %}
After creating urls and file paths, files are downloaded using `Map` function - it is a warpper of `mapply`. Note that, in case the function breaks by an error (eg when a file doesn't exist), `download.file` is wrapped by another function that includes an error handler (`tryCatch`).
{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
codes,"&output=csv")
paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\)
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
tryCatch(
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
message(c$message)
}
)
}
# wrapper of mapply
Map(downloadFile, urls, paths)
{% endhighlight %}
Finally files are read back using `llply` and they are combined using `rbind_all`. Note that, as the merged data has multiple stocks' records, `Code` column is created.
{% highlight r %}
# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
data <- read.csv(file, stringsAsFactors = FALSE)
# get code from file path
pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
# first column's name is funny
names(data) <- c("Date","Open","High","Low","Close","Volume")
data$Date <- dmy(data$Date)
data$Open <- as.numeric(data$Open)
data$High <- as.numeric(data$High)
data$Low <- as.numeric(data$Low)
data$Close <- as.numeric(data$Close)
data$Volume <- as.integer(data$Volume)
data$Code <- code
data
}, .progress = "text")
data <- rbind_all(dataList)
{% endhighlight %}
Some of the values are shown below.
|Date | Open| High| Low| Close| Volume|Code |
|:----------|-----:|-----:|-----:|-----:|--------:|:----|
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |
This way wouldn't be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly.
I hope this article is useful and I'm going to write an article to show the second way.
Method #2:
---
layout: post
title: "2014-11-20-Download-Stock-Data-2"
description: ""
category: R
tags: [knitr,lubridate,stringr,plyr,dplyr]
---
{% include JB/setup %}
In an [earlier article](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), a way to download stock price data files from Google, save it into a local drive and merge them into a single data frame. If files are not large, however, it wouldn't be effective and, in this article, files are downloaded and merged internally.
The following packages are used.
{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(plyr)
library(dplyr)
{% endhighlight %}
Taking urls as file locations, files are directly read using `llply` and they are combined using `rbind_all`. As the merged data has multiple stocks' records, `Code` column is created. Note that, when an error occurrs, the function returns a dummy data frame in order not to break the loop - values of the dummy data frame(s) are filtered out at the end.
{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
files <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
codes,"&output=csv")
dataList <- llply(files, function(file, ...) {
# get code from file url
pattern <- "Q:[0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z]"
code <- substr(str_extract(file, pattern), 3, nchar(str_extract(file, pattern)))
# read data directly from a URL with only simple error handling
# for further error handling: http://adv-r.had.co.nz/Exceptions-Debugging.html
tryCatch({
data <- read.csv(file, stringsAsFactors = FALSE)
# first column's name is funny
names(data) <- c("Date","Open","High","Low","Close","Volume")
data$Date <- dmy(data$Date)
data$Open <- as.numeric(data$Open)
data$High <- as.numeric(data$High)
data$Low <- as.numeric(data$Low)
data$Close <- as.numeric(data$Close)
data$Volume <- as.integer(data$Volume)
data$Code <- code
data
},
error = function(c) {
c$message <- paste(code,"failed")
message(c$message)
# return a dummy data frame
data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Open=0, High=0,
Low=0, Close=0, Volume=0, Code="NA")
data
})
})
# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
{% endhighlight %}
Some of the values are shown below.
|Date | Open| High| Low| Close| Volume|Code |
|:----------|-----:|-----:|-----:|-----:|--------:|:----|
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |
It took a bit longer to complete the script as I had to teach myself how to handle errors in R. And this is why I started to write articles in this blog.
I hope this article is useful.
Summarize Stock returns From Multiple Files:
---
layout: post
title: "2014-11-27-Summarise-Stock-Returns-from-Multiple-Files"
description: ""
category: R
tags: [knitr,lubridate,stringr,reshape2,plyr,dplyr]
---
{% include JB/setup %}
This is a slight extension of the previous two articles ([2014-11-20-Download-Stock-Data-1](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), [2014-11-20-Download-Stock-Data-2](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-2/)) and it aims to produce gross returns, standard deviation and correlation of multiple shares.
The following packages are used.
{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(reshape2)
library(plyr)
library(dplyr)
{% endhighlight %}
The script begins with creating a data folder in the format of *data_YYYY-MM-DD*.
{% highlight r %}
# create data folder
dataDir <- paste0("data","_",format(Sys.Date(),"%Y-%m-%d"))
if(file.exists(dataDir)) {
unlink(dataDir, recursive = TRUE)
dir.create(dataDir)
} else {
dir.create(dataDir)
}
{% endhighlight %}
Given company codes, URLs and file paths are created. Then data files are downloaded by `Map`, which is a wrapper of `mapply`. Note that R's `download.file` function is wrapped by `downloadFile` so that the function does not break when an error occurs.
{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC")
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
codes,"&output=csv")
paths <- paste0(dataDir,"/",codes,".csv") # backward slash on windows (\)
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
tryCatch(
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
message(c$message)
}
)
}
# wrapper of mapply
Map(downloadFile, urls, paths)
{% endhighlight %}
Once the files are downloaded, they are read back to combine using `rbind_all`. Some more details about this step is listed below.
* only Date, Close and Code columns are taken
* codes are extracted from file paths by matching a regular expression
* data is arranged by date as the raw files are sorted in a descending order
* error is handled by returning a dummy data frame where its code value is NA.
* individual data files are merged in a long format
* 'NA' is filtered out
{% highlight r %}
# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
# get code from file path
pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
tryCatch({
data <- read.csv(file, stringsAsFactors = FALSE)
# first column's name is funny
names(data) <- c("Date","Open","High","Low","Close","Volume")
data$Date <- dmy(data$Date)
data$Close <- as.numeric(data$Close)
data$Code <- code
# optional
data$Open <- as.numeric(data$Open)
data$High <- as.numeric(data$High)
data$Low <- as.numeric(data$Low)
data$Volume <- as.integer(data$Volume)
# select only 'Date', 'Close' and 'Code'
# raw data should be arranged in an ascending order
arrange(subset(data, select = c(Date, Close, Code)), Date)
},
error = function(c){
c$message <- paste(code,"failed")
message(c$message)
# return a dummy data frame not to break function
data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Close=0, Code="NA")
data
})
}, .progress = "text")
# data is combined to create a long format
# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
{% endhighlight %}
Some values of this long format data is shown below.
|Date | Close|Code |
|:----------|-----:|:----|
|2013-11-29 | 38.13|MSFT |
|2013-12-02 | 38.45|MSFT |
|2013-12-03 | 38.31|MSFT |
|2013-12-04 | 38.94|MSFT |
|2013-12-05 | 38.00|MSFT |
|2013-12-06 | 38.36|MSFT |
The data is converted into a wide format data where the x and y variables are Date and Code respectively (`Date ~ Code`) while the value variable is Close (`value.var="Close"`). Some values of the wide format data is shown below.
{% highlight r %}
# data is converted into a wide format
data <- dcast(data, Date ~ Code, value.var="Close")
kable(head(data))
{% endhighlight %}
|Date | MSFT| TCHC|
|:----------|-----:|-----:|
|2013-11-29 | 38.13| 13.52|
|2013-12-02 | 38.45| 13.81|
|2013-12-03 | 38.31| 13.48|
|2013-12-04 | 38.94| 13.71|
|2013-12-05 | 38.00| 13.55|
|2013-12-06 | 38.36| 13.95|
The remaining steps are just differencing close price values after taking log and applying `sum`, `sd`, and `cor`.
{% highlight r %}
# select except for Date column
data <- select(data, -Date)
# apply log difference column wise
dailyRet <- apply(log(data), 2, diff, lag=1)
# obtain daily return, variance and correlation
returns <- apply(dailyRet, 2, sum, na.rm = TRUE)
std <- apply(dailyRet, 2, sd, na.rm = TRUE)
correlation <- cor(dailyRet)
returns
{% endhighlight %}
{% highlight text %}
## MSFT TCHC
## 0.2249777 0.6293973
{% endhighlight %}
{% highlight r %}
std
{% endhighlight %}
{% highlight text %}
## MSFT TCHC
## 0.01167381 0.03203031
{% endhighlight %}
{% highlight r %}
correlation
{% endhighlight %}
{% highlight text %}
## MSFT TCHC
## MSFT 1.0000000 0.1481043
## TCHC 0.1481043 1.0000000
{% endhighlight %}
Finally the data folder is deleted.
{% highlight r %}
# delete data folder
if(file.exists(dataDir)) { unlink(dataDir, recursive = TRUE) }
{% endhighlight %}
ありがとうございます!次回は近くで読むべきだと思う – steich