は、私は、次のデータテーブルを持っている

パターンマッチングに参加：は、私は、次のデータテーブルを持っている

> measures 
    source  measure 
1: my123 0.08130182 
2: 123my -1.45285168 
3: your123 -0.30460771 
4: 123your 0.94670380 
5: 12your3 -0.54728546 
> sources 
      name pattern 
1: My Source  my 
2: Your Source your

は、私がlike(measures.source, sources.pattern)に参加できるようにしたいと思い

measures <- data.table(source=c('my123', '123my', 'your123', '123your', '12your3'), measure=rnorm(5)) 
sources <- data.table(name=c('My Source', 'Your Source'), pattern=c('my', 'your'))

を使用して作成しました。

私はこれをSQL（PostgreSQL、以下を参照してください）で行うことができますが、私はこれを行う良い方法はありますか（クロスジョインをして不一致の行をフィルタリングする必要はありません。 Rのdata.tableでこれを行う方法があるのだろうか、または今後より多くのカスタム結合機能を導入する予定です。

drop table if exists measures; 
create table measures as (select * from (values 
    ('my123', 0.08130182), 
    ('123my', -1.45285168), 
    ('your123', -0.30460771), 
    ('123your', 0.94670380), 
    ('your123', 0.94670380) 
)t(source, measure)); 

drop table if exists sources; 
create table sources as (select * from (values 
    ('My Source', 'my'), 
    ('Your Sources', 'your') 
)t(name, pattern)); 

select * from measures join sources on measures.source ~ sources.pattern;

、これは必要な返します。これは「非現実的」かに該当する場合

source | measure |  name  | pattern 
--------+-------------+--------------+--------- 
my123 | 0.08130182 | My Source | my 
123my | -1.45285168 | My Source | my 
your123 | -0.30460771 | Your Sources | your 
123your | 0.94670380 | Your Sources | your 
your123 | 0.94670380 | Your Sources | your

出典

2017-04-12 nikola

私はあなたが本当に意味を前提と 'measures.source'はない' measures.name' – G5W

オープンがありますFRは、このhttps://github.com/Rdatatable/data.table/issues/1431ラベルの高い優先順位のfwiwです。 – Frank

@ G5Wありがとう、私はそれを修正しました – nikola

私はわからないんだけど、これはそれを行う...と、より複雑で、あなたの目的のためになりますパターンマッチングstringiが照合を処理します。

> rbind.pages(lapply(1:nrow(measures), function(i){ 
     matched_slice <- which(stri_detect_regex(measures[i,1],sources$pattern)) 
     data.frame(measures[i,], sources[matched_slice, ]) 
    })) 
    source  measure  name pattern 
1 my123 0.75119183 My Source  my 
2 123my 0.55344334 My Source  my 
3 your123 -0.03498414 Your Source your 
4 123your 0.09364795 Your Source your 
5 12your3 0.47537732 Your Source your

そして、大きなデータセットがparallel::mclapplyでこれを実行したり、data.table -ish方法を用：

rbindlist(lapply(1:nrow(measures), function(i){ 
    matched_slice <- which(stri_detect_regex(measures[i,1],sources$pattern)) 
    cbind(measures[i,], sources[matched_slice, ]) 
}))

出典

2017-04-12 17:32:08

残念ながら、それは私が取り組んでいるデータセットに長時間かかるので、残念ながらまだ実現不可能です。 – nikola

は、私は、次のデータテーブルを持っている

答えて

関連する問題