2017-06-06 20 views
0

fuzzy_join私はこれらのtwoquestionsに答えると、適切な解決策を得たが、私はトラブル私はfuzzyjoin::stringdist_joinから抽出match_funにfuzzy_joinを使用して引数を渡すを有していました。R-はfuzzyjoinにmatch_fun関数に引数を渡す::

私は取得していますエラーメッセージは次のとおりです。

# Error in mf(rep(u_x, n_y), rep(u_y, each = n_x), ...): object 'ignore_case' not found 

...必要性のいくつかはmatch_funで変更される場合はわかりません。

fuzzyjoinの@dgrtwoに質問しますが、おそらくあなたの問題をすばやく特定できます。

更新編集: 問題がある場合、問題は、https://github.com/dgrtwo/fuzzyjoin/blob/master/R/fuzzy_join.Rになるようmf関数であることだ...見えます。


# Data: 
library(data.table, quietly = TRUE) 
Address1 <- c("786, GALI NO 5, XYZ","rambo, 45, strret 4, atlast, pqr","23/4, 23RD FLOOR, STREET 2, ABC-E, PQR","45-B, GALI NO5, XYZ","HECTIC, 99 STREET, PQR") 
AREACODE <- c('10','10','14','20','30') 
Year1 <- c(2001:2005) 

Address2 <- c("abc, pqr, xyz","786, GALI NO 4 XYZ","45B, GALI NO 5, XYZ","del, 546, strret2, towards east, pqr","23/4, STREET 2, PQR","abc, pqr, xyz","786, GALI NO 4 XYZ","45B, GALI NO 5, XYZ","del, 546, strret2, towards east, pqr","23/4, STREET 2, PQR") 
Year2 <- c(2001:2010) 
AREA_CODE <- c('10','10','10','20','30','40','50','61','64', '99') 

data1 <- data.table(Address1, Year1, AREACODE) 
data2 <- data.table(Address2, Year2, AREA_CODE) 
data2[, unique_id := sprintf("%06d", 1:nrow(data2))] 

# Solution: 
library(fuzzyjoin, quietly = TRUE); library(dplyr, quietly = TRUE) 

# First, need to define match_fun_stringdist 
# Code from stringdist_join from https://github.com/dgrtwo/fuzzyjoin/blob/master/R/stringdist_join.R 
match_fun_stringdist <- function(v1, v2, ...) { 

    if (ignore_case) { 
    v1 <- stringr::str_to_lower(v1) 
    v2 <- stringr::str_to_lower(v2) 
    } 

    dists <- stringdist::stringdist(v1, v2, method = method, ...) 

    ret <- dplyr::data_frame(include = (dists <= max_dist)) 
    if (!is.null(distance_col)) { 
    ret[[distance_col]] <- dists 
    } 
    ret 
} 

# Call fuzzy_join 
fuzzy_join(data1, data2, 
      by = list(x = c("Address1", "AREACODE", "Year1"), y = c("Address2", "AREA_CODE", "Year2")), 
      match_fun = list(match_fun_stringdist, `==`, `<=`), 
      mode = "left", 
      ignore_case = FALSE, 
      method = "dl", 
      max_dist = 99, 
      distance_col = "dist" 
) %>% 
    group_by(Address1, Year1, AREACODE) %>% 
    top_n(1, -Address1.dist) %>% 
    top_n(1, Year2) %>% 
    select(unique_id, Address1.dist, everything()) 
#> Error in mf(rep(u_x, n_y), rep(u_y, each = n_x), ...): object 'ignore_case' not found 

答えて

0

これまでのところ、私は、各match_funは、例えば異なるだろうので、それはmatch_fun年代複数に引数を渡すことはできないと思います余分な引数を>=のmatch_funに渡すことはできません。

関連する問題