は言った - 最大30倍 - stringi
を使用:)あなたが見ることができるように、stringiソリューションは、短くわかりやすくしてはるかに高速です!
短い答え:
arr <- stri_extract_all_regex(x, "(?<=[\\[\\(,])[0-9.]+(?=[\\]\\),])", simplify = NA)
data.frame(low = as.numeric(arr[,1]), high = as.numeric(arr[,2]))
長い答え:
require(stringi)
require(microbenchmark)
grepFun <- function(x){
mat <- regmatches(x,
gregexpr("(?<=[\\[\\(,])[0-9.]+(?=[\\]\\),])", x, perl = TRUE))
newnames <- lapply(mat, function(m) {
if (! length(m)) return(list(low = NA, high = NA))
setNames(as.list(as.numeric(m)), nm = c("low", "high"))
})
do.call(rbind.data.frame, newnames)
}
striFun <- function(x){
arr <- stri_extract_all_regex(x, "(?<=[\\[\\(,])[0-9.]+(?=[\\]\\),])", simplify = NA)
data.frame(low = as.numeric(arr[,1]), high = as.numeric(arr[,2]))
}
# both functions work the same
grepFun(scorenames)
low high
1 NA NA
2 20.0 180.0
3 360.0 460.0
4 460.0 629.0
...
12 25.0 49.0
striFun(scorenames)
low high
1 NA NA
2 20.0 180.0
3 360.0 460.0
4 460.0 629.0
...
12 25.0 49.0
# generating more complicated vector
n <- 10000
x <- stri_paste(stri_rand_strings(n, length = 1:10), sample(c("(","["),n,TRUE),
sample(1000,n,TRUE), ",", sample(1000,n,TRUE), sample(c(")","]"), n, TRUE))
head(x) # check first elements
[1] "O[68,434]" "Ql[783,151)" "Zk0(773,60)" "ETfV(446,518]" "Xixbr(576,855)" "G6QnHu(92,955)"
#short test using new data
grepFun(x[1:6])
low high
1 68 434
2 783 151
3 773 60
4 446 518
5 576 855
6 92 955
striFun(x[1:6])
low high
1 68 434
2 783 151
3 773 60
4 446 518
5 576 855
6 92 955
#and some benchmark to prove performance
microbenchmark(grepFun(x), striFun(x))
Unit: milliseconds
expr min lq mean median uq max neval
grepFun(x) 330.27733 366.09306 416.56330 406.08914 465.29829 568.15250 100
striFun(x) 11.57449 11.97825 13.38157 12.46927 13.67699 25.97455 100
これは何語ですか? – tilz0R
申し訳ありませんが、これはR –
です[括弧で囲まれた[strsplit]の複製が可能です(http://stackoverflow.com/questions/31292853/strsplit-by-parentheses) – BigDataScientist