2016-03-28 11 views
0

私は結果のデータフレームを持っています。 Cruise_Strataの複数の比較があります。私はcruise_strata(Cruise1_Strata1とCruise2_Strata2)という2つの列を持っています。私が見つけた問題は、データフレームに「重複した」レコードがあることです。例えば、1つの行が2つの異なる列のデータフレームから行を削除するR

Cruise_Strata1 Cruise_Strata2 
201501.35   201502.35 

を有し、別の行は

Cruise_Strata1 Cruise_Strata2 
201502.35   201501.35 

行が残りの列についても同様の結果を有している必要があります。私はこれが起こる行を識別し、データセットから1行を削除することができたいと思いますが、それについてどうやって行くのか分かりません。彼らは重複していないので、私は重複を使用することはできません。

ご協力いただければ幸いです。

ここはデータフレームです。

dput(result5) 
structure(list(Cruise_Strata1 = structure(c(1L, 1L, 2L, 2L, 3L, 
3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 
11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 
17L, 18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L, 
24L, 24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L, 28L, 29L, 29L, 30L, 
30L, 31L, 31L, 32L, 32L, 33L, 33L, 34L, 34L, 35L, 35L, 36L, 36L, 
37L, 37L, 38L, 38L, 39L, 39L, 40L, 40L, 41L, 41L, 42L, 42L, 43L, 
43L, 44L, 44L, 45L, 45L, 46L, 46L, 47L, 47L, 48L, 48L, 49L, 49L, 
50L, 50L, 51L, 51L, 52L, 52L, 53L, 53L, 54L, 54L, 55L, 55L, 56L, 
56L, 57L, 57L, 58L, 58L, 59L, 59L, 60L, 60L, 61L, 61L, 62L, 62L, 
63L, 63L, 64L, 64L, 65L, 65L, 66L, 66L), .Label = c("201501.10", 
"201501.11", "201501.13", "201501.14", "201501.15", "201501.17", 
"201501.18", "201501.19", "201501.21", "201501.22", "201501.23", 
"201501.24", "201501.25", "201501.26", "201501.27", "201501.29", 
"201501.30", "201501.31", "201501.33", "201501.34", "201501.35", 
"201501.9", "201502.10", "201502.11", "201502.13", "201502.14", 
"201502.15", "201502.17", "201502.18", "201502.19", "201502.21", 
"201502.22", "201502.23", "201502.24", "201502.25", "201502.26", 
"201502.27", "201502.29", "201502.30", "201502.31", "201502.33", 
"201502.34", "201502.35", "201502.9", "201503.10", "201503.11", 
"201503.13", "201503.14", "201503.15", "201503.17", "201503.18", 
"201503.19", "201503.21", "201503.22", "201503.23", "201503.24", 
"201503.25", "201503.26", "201503.27", "201503.29", "201503.30", 
"201503.31", "201503.33", "201503.34", "201503.35", "201503.9" 
), class = "factor"), Cruise_Strata2 = structure(c(23L, 45L, 
24L, 46L, 25L, 47L, 26L, 48L, 27L, 49L, 28L, 50L, 29L, 51L, 30L, 
52L, 31L, 53L, 32L, 54L, 33L, 55L, 34L, 56L, 35L, 57L, 36L, 58L, 
37L, 59L, 38L, 60L, 39L, 61L, 40L, 62L, 41L, 63L, 42L, 64L, 43L, 
65L, 44L, 66L, 1L, 45L, 2L, 46L, 3L, 47L, 4L, 48L, 5L, 49L, 6L, 
50L, 7L, 51L, 8L, 52L, 9L, 53L, 10L, 54L, 11L, 55L, 12L, 56L, 
13L, 57L, 14L, 58L, 15L, 59L, 16L, 60L, 17L, 61L, 18L, 62L, 19L, 
63L, 20L, 64L, 21L, 65L, 22L, 66L, 1L, 23L, 2L, 24L, 3L, 25L, 
4L, 26L, 5L, 27L, 6L, 28L, 7L, 29L, 8L, 30L, 9L, 31L, 10L, 32L, 
11L, 33L, 12L, 34L, 13L, 35L, 14L, 36L, 15L, 37L, 16L, 38L, 17L, 
39L, 18L, 40L, 19L, 41L, 20L, 42L, 21L, 43L, 22L, 44L), .Label = c("201501.10", 
"201501.11", "201501.13", "201501.14", "201501.15", "201501.17", 
"201501.18", "201501.19", "201501.21", "201501.22", "201501.23", 
"201501.24", "201501.25", "201501.26", "201501.27", "201501.29", 
"201501.30", "201501.31", "201501.33", "201501.34", "201501.35", 
"201501.9", "201502.10", "201502.11", "201502.13", "201502.14", 
"201502.15", "201502.17", "201502.18", "201502.19", "201502.21", 
"201502.22", "201502.23", "201502.24", "201502.25", "201502.26", 
"201502.27", "201502.29", "201502.30", "201502.31", "201502.33", 
"201502.34", "201502.35", "201502.9", "201503.10", "201503.11", 
"201503.13", "201503.14", "201503.15", "201503.17", "201503.18", 
"201503.19", "201503.21", "201503.22", "201503.23", "201503.24", 
"201503.25", "201503.26", "201503.27", "201503.29", "201503.30", 
"201503.31", "201503.33", "201503.34", "201503.35", "201503.9" 
), class = "factor"), P_value = c(0.63, 0.6793, 0.0319, 0.0289, 
0.9516, 0.8128, 0.9967, 0.3071, 0.9641, 0.0246, 0.7967, 0.2551, 
0.2329, 0.3725, 0.0269, 0.3796, 0.0245, 0.5562, 0.9952, 0.5176, 
0.5596, 0.9966, 0.32, 0.6402, 0.7691, 0.9671, 0.9396, 0.9, 0.9024, 
0.3624, 0.0433, 0.3402, 0.5302, 0.787, 0.0295, 0.3638, 0.006, 
0.701, 0.6323, 0.0366, 2e-04, 0.0011, 0.8849, 0.3, 0.63, 0.9738, 
0.0319, 0.5197, 0.9516, 0.7369, 0.9967, 0.2276, 0.9641, 0.0158, 
0.7967, 0.6332, 0.2329, 0.0322, 0.0269, 0.3013, 0.0245, 0.0129, 
0.9952, 0.795, 0.5596, 0.7277, 0.32, 0.747, 0.7691, 0.3817, 0.9396, 
0.7961, 0.9024, 0.4164, 0.0433, 0.0028, 0.5302, 0.2864, 0.0295, 
0.7036, 0.006, 0, 0.6323, 0.002, 2e-04, 0.9548, 0.8849, 0.0546, 
0.6793, 0.9738, 0.0289, 0.5197, 0.8128, 0.7369, 0.3071, 0.2276, 
0.0246, 0.0158, 0.2551, 0.6332, 0.3725, 0.0322, 0.3796, 0.3013, 
0.5562, 0.0129, 0.5176, 0.795, 0.9966, 0.7277, 0.6402, 0.747, 
0.9671, 0.3817, 0.9, 0.7961, 0.3624, 0.4164, 0.3402, 0.0028, 
0.787, 0.2864, 0.3638, 0.7036, 0.701, 0, 0.0366, 0.002, 0.0011, 
0.9548, 0.3, 0.0546), Cruise1 = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("201501", 
"201502", "201503"), class = "factor"), Cruise1_Strata1 = structure(c(1L, 
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 
9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L, 
16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 
22L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 
8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, 
14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L, 20L, 
21L, 21L, 22L, 22L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 
6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 
13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 
20L, 20L, 21L, 21L, 22L, 22L), .Label = c("10", "11", "13", "14", 
"15", "17", "18", "19", "21", "22", "23", "24", "25", "26", "27", 
"29", "30", "31", "33", "34", "35", "9"), class = "factor"), 
    Cruise2 = structure(c(2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 
    3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 
    2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 
    3L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 
    1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 
    3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 
    1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L), .Label = c("201501", "201502", "201503"), class = "factor"), 
    Cruise2_Strata2 = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 
    4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 
    11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 
    17L, 18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L, 1L, 
    1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 
    9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L, 
    15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 20L, 20L, 
    21L, 21L, 22L, 22L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 
    6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 
    12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 
    18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L), .Label = c("10", 
    "11", "13", "14", "15", "17", "18", "19", "21", "22", "23", 
    "24", "25", "26", "27", "29", "30", "31", "33", "34", "35", 
    "9"), class = "factor"), adjuste_p = c(1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.792, 1, 1, 1, 0.0264, 
    0.1452, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.3696, 1, 
    1, 1, 1, 0.792, 0, 1, 0.264, 0.0264, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 0.3696, 1, 1, 1, 1, 1, 0, 1, 0.264, 
    0.1452, 1, 1, 1)), .Names = c("Cruise_Strata1", "Cruise_Strata2", 
"P_value", "Cruise1", "Cruise1_Strata1", "Cruise2", "Cruise2_Strata2", 
"adjuste_p"), row.names = c(1453L, 2905L, 1520L, 2972L, 1587L, 
3039L, 1654L, 3106L, 1721L, 3173L, 1788L, 3240L, 1855L, 3307L, 
1922L, 3374L, 1989L, 3441L, 2056L, 3508L, 2123L, 3575L, 2190L, 
3642L, 2257L, 3709L, 2324L, 3776L, 2391L, 3843L, 2458L, 3910L, 
2525L, 3977L, 2592L, 4044L, 2659L, 4111L, 2726L, 4178L, 2793L, 
4245L, 2860L, 4312L, 23L, 2927L, 90L, 2994L, 157L, 3061L, 224L, 
3128L, 291L, 3195L, 358L, 3262L, 425L, 3329L, 492L, 3396L, 559L, 
3463L, 626L, 3530L, 693L, 3597L, 760L, 3664L, 827L, 3731L, 894L, 
3798L, 961L, 3865L, 1028L, 3932L, 1095L, 3999L, 1162L, 4066L, 
1229L, 4133L, 1296L, 4200L, 1363L, 4267L, 1430L, 4334L, 45L, 
1497L, 112L, 1564L, 179L, 1631L, 246L, 1698L, 313L, 1765L, 380L, 
1832L, 447L, 1899L, 514L, 1966L, 581L, 2033L, 648L, 2100L, 715L, 
2167L, 782L, 2234L, 849L, 2301L, 916L, 2368L, 983L, 2435L, 1050L, 
2502L, 1117L, 2569L, 1184L, 2636L, 1251L, 2703L, 1318L, 2770L, 
1385L, 2837L, 1452L, 2904L), class = "data.frame") 

R情報

R version 3.2.1 (2015-06-18) 
Platform: i386-w64-mingw32/i386 (32-bit) 
Running under: Windows 7 x64 (build 7601) Service Pack 1 
+0

Cruise_Strata1'と 'Cruise_Strata2'の代わりに' Cruise_Strata1'と 'Cruise_Strata2'変数を使用していますか? – DatamineR

+0

あなたの出力は実際にそのような重複を含んでいますか? –

+0

はい申し訳ありませんが、私はCruise_Strata1とCruise_Strata2を意味しました。 – user41509

答えて

2

これはあなたの望ましい結果が得られていますか?

duplicated(apply(cbind(result5$Cruise_Strata1, df$Cruise_Strata2), 1, 
    function(x) paste(min(x), max(x)))) 

結果論理ベクトルを使用してデータをサブセット化することができます。

最初に、Cruise_Strata1Cruise_Strata2の値を貼り付けるベクトルを作成します。これを行うと、2つのうちの小さいものを前面に移動し、大きいものを最後に移動します(またはその逆も可能です)。これは、ちょうどあなたがduplicated関数を適用し、重複を認識できるようにするためのトリックです。

注:このアプローチは、フォームの重複を削除します:

Cruise_Strata1 Cruise_Strata2 
x    y 
y    x 

だけでなく(これが望ましくないなら、私に知らせて):

Cruise_Strata1 Cruise_Strata2 
x    y 
x    y 
+0

はい、そのトリックをしました。あなたが気にしないなら、コードが何をしているのかを説明して、あなたがしたことを理解できるか教えてください。 – user41509

+0

あなたの説明をありがとう。私はすでに提供した2番目の例の形で重複を取り除いていましたが、このコードを使用して同様のことを行うことができることを知っておくとよいでしょう。 – user41509

0

と、一般的なデータフレームdfについて重複値:Cruise_Strata1Cruise_Strata2

df$dupe <- 0 
for(i in 1:(length(df$Cruise_Strata1)-1)) 
{ 
    for(j in (i+1):length(df$Cruise_Strata1)) 
    if(df$Cruise_Strata1[i]==df$Cruise_Strata2[j]) 
     {print(df[c(i,j),]); df$dupe[i] = 1;break} 
} 

    df[df$dupe != 1,] 
関連する問題