複数の条件で2つのテーブルをマージする

私が試みていることを正確に答える以前の質問を見つけることができませんでした。複数の条件で2つのテーブルをマージする

がマージに関係はなく、複数の他の列であることを保持する必要がある、

chr position effect.exposure ... 
1 12345  A    ... 
2 54321  G    ... 
2 6789  C    ... 
3 9876  D    ...

DF1

DF2

私のデータの一般的なフォーマットである

chr position effect.outcome other ... 
1 12345  A    C  ... 
2 54321  T    G  ... 
3 12314  C    A  ... 
5 12321  C    D  ...

。

"chr"と "position"が厳密に同じ行をマージするだけでなく、df1の "effect.exposure"がdf2の "effect.outcome"または "other"と一致することを確認してください。重要なことに、 "effect.exposure"が "effect.outcome"または "other"と一致しない場合、その行を削除します。

"chr"と "position"を結合して結果データ内にそれぞれ1つの列しか持たせることはできませんが、2つの "エフェクト"と "その他"の列を最終的なデータテーブルに残しておきます。

更新：

は、問題を回避する方法を発見しました。私がやったやり方は、2つのデータフレームを "chr"と "position"でマージすることです。ここから

new.df <- merge(df1, df2, by = c("chr", "position"))

「effect.exposureは」「effect.outcome」または「その他」のいずれかに等しい場合、私は、このデータフレームのサブセットを撮影しました。

final.df <- new.df[new.df$effect.exposure == new.df$effect.outcome | 
        new.df$effect.exposure == new.df$other, ]

これは最も効率的な方法ではないかもしれませんが、完全に機能します。

出典

2017-11-21 Dan

[R 2つの基準のいずれかが一致したときに2つのデータフレームをマージ]の可能な重複（https://stackoverflow.com/質問/ 38753092/r-merge-two-data-frames-two-criteria-of-two-criteria-matchesの場合） – duckmayr

これは古い答えの1つであり、2 mergeが実行された場合、各マージの結果はrbindです。データの問題は、異なる数の列で結果をマージする方法です。あなたはそれに対処するためにtidyr::gatherとtidyr::spreadを使用することができます。

あなたのデータ

df1 <- structure(list(chr = c(1L, 2L, 2L, 3L), position = c(12345L, 
54321L, 6789L, 9876L), effect.exposure = c("A", "G", "C", "D" 
), misc = c("a", "b", "c", "d")), .Names = c("chr", "position", 
"effect.exposure", "misc"), class = "data.frame", row.names = c(NA, 
-4L)) 

df2 <- structure(list(chr = c(1L, 2L, 3L, 5L), position = c(12345L, 
54321L, 12314L, 12321L), effect.outcome = c("A", "T", "C", "C" 
), other = c("C", "G", "A", "D")), .Names = c("chr", "position", 
"effect.outcome", "other"), class = "data.frame", row.names = c(NA, 
-4L))

古い答え

library(dplyr) 
library(tidyr) 
result1 <- inner_join(df1, df2, by=c("chr", "position", "effect.exposure" = "effect.outcome")) %>% 
       gather(key, value, -chr, -position, -effect.exposure) 

    # chr position effect.exposure key value 
# 1 1 12345    A misc  a 
# 2 1 12345    A other  C 

result2 <- inner_join(df1, df2, by=c("chr", "position", "effect.exposure" = "other")) %>% 
      gather(key, value, -chr, -position, -effect.exposure) 

    # chr position effect.exposure   key value 
# 1 2 54321    G   misc  b 
# 2 2 54321    G effect.outcome  T 

ans <- rbind(result1, result2) %>% 
      spread(key, value) 

    # chr position effect.exposure effect.outcome misc other 
# 1 1 12345    A   <NA> a  C 
# 2 2 54321    G    T b <NA>

出典

2017-11-21 13:05:58 CPak

ありがとうございます。私は実際に私が望むことをする方法を見つけ出すことができました。最も効果的で効率的ではないかもしれませんが、この目的のために働きます。他の人が見ることができるように私は元の質問を編集しました。 – Dan

・ホープ、このことができますの拡張！

library(dplyr) 
final_df <- df1 %>% 
    inner_join(df2, by=c("chr", "position")) %>% 
    mutate(Resp_final = if_else((as.character(effect_exposure)==as.character(effect_outcome)) | 
           (as.character(effect_exposure)==as.character(other)), 1, 0)) %>% 
    filter(Resp_final==1) %>% 
    select(-Resp_final) 
final_df

出力は：

chr position effect_exposure col4 effect_outcome other col5 
1 1 12345    A Asdf    A  C 1234 
2 2 54321    G Abc    T  G 987

#Sample data 
> dput(df1) 
structure(list(chr = c(1L, 2L, 2L, 3L), position = c(12345L, 
54321L, 6789L, 9876L), effect_exposure = structure(c(1L, 4L, 
2L, 3L), .Label = c("A", "C", "D", "G"), class = "factor"), col4 = structure(c(2L, 
1L, 4L, 3L), .Label = c("Abc", "Asdf", "qwerty", "xyz"), class = "factor")), .Names = c("chr", 
"position", "effect_exposure", "col4"), class = "data.frame", row.names = c(NA, 
-4L)) 

> dput(df2) 
structure(list(chr = c(1L, 2L, 3L, 5L), position = c(12345L, 
54321L, 12314L, 12321L), effect_outcome = structure(c(1L, 3L, 
2L, 2L), .Label = c("A", "C", "T"), class = "factor"), other = structure(c(2L, 
4L, 1L, 3L), .Label = c("A", "C", "D", "G"), class = "factor"), 
    col5 = c(1234L, 987L, 675L, 3456L)), .Names = c("chr", "position", 
"effect_outcome", "other", "col5"), class = "data.frame", row.names = c(NA, 
-4L))

出典

2017-11-21 13:34:12 Prem

ありがとうございます。結論として、私は実際に自分自身を解決する方法を見つけました。好奇心が強い場合は元の投稿の編集をご覧ください。 – Dan

パーフェクト！あなた自身でそれを解決して嬉しい:) – Prem

複数の条件で2つのテーブルをマージする

答えて

関連する問題