ブランク/欠損値を持つ一意の関数の使用方法

-1

ブランク/欠損値がある場合、下のデータフレーム行を第2列に一意に依存させる方法はありますか？ブランク/欠損値を持つ一意の関数の使用方法

> head(interproscan) 
       V1  V14 
1 sp0000001-mRNA-1   
2 sp0000001-mRNA-1   
3 sp0000001-mRNA-1   
4 sp0000005-mRNA-1 GO:0003723 
5 sp0000006-mRNA-1 GO:0016021 
6 sp0000006-mRNA-1 GO:0016021 


> head(unique(interproscan[ , 1:2])) 
       V1        V14 
1 sp0000001-mRNA-1         
4 sp0000005-mRNA-1      GO:0003723 
5 sp0000006-mRNA-1      GO:0016021 
7 sp0000006-mRNA-2      GO:0016021 
9 sp0000006-mRNA-3      GO:0016021

目的は、次のようになります。

    V1        V14 
1 sp0000001-mRNA-1         
4 sp0000005-mRNA-1      GO:0003723 
5 sp0000006-mRNA-1      GO:0016021

あなたはそれによってグループにあなたが意図している方法をV1を変更する必要があり、事前に

出典

2017-09-13 user977828

'ライブラリ（tidyverse）。 interproscan％>％distinct％（V14、.keep_all = T） 'はあなたのために働きます。他に何かありますか？ – Tunn

'ライブラリ（tidyverse）; > interproscan％>％異なる（V14、.keep_all = T） V1のV14 1：sp0000001-mRNAを-1 NA >ヘッド（interproscan） V1のV14 1：sp0000001-mRNAを-1 NA 2：sp0000001体mRNA 1：sp0000006-mRNA-1 NA：sp0000006-mRNA-1 NA：01200006-mRNA-1 – user977828

は、データフレームまたはデータテーブルでこれを試してみてください：

interproscan <- data.frame(interproscan) 

unique(interproscan)

出力：

   V1  V14 
1 sp0000001-mRNA-1   
4 sp0000005-mRNA-1 GO:0003723 
5 sp0000006-mRNA-1 GO:0016021

サンプルデータ：

require(data.table) 
interproscan <- fread("V1,    V14 
         sp0000001-mRNA-1,   
         sp0000001-mRNA-1,   
         sp0000001-mRNA-1,    
         sp0000005-mRNA-1, GO:0003723 
         sp0000006-mRNA-1, GO:0016021 
         sp0000006-mRNA-1, GO:0016021")

出典

2017-09-14 00:01:06 www

、ありがとうございました。私は最後の-number接尾辞を捨てるためにgsubを使います。

library(dplyr) 
ans <- df %>% 
     group_by(gsub("-\\d","",V1), V14) %>% # now it groups the way you want 
     arrange(V1) %>% # unnecessary for your toy example but just in case for your full data 
     slice(1) %>%  # select top row-entry 
     ungroup() %>% 
     select(-4)  # discard intermediate grouping variable

出力

# A tibble: 3 x 3 
    id    V1  V14 
    <int>   <chr>  <chr> 
1  1 sp0000001-mRNA-1   
2  4 sp0000005-mRNA-1 GO:0003723 
3  5 sp0000006-mRNA-1 GO:0016021

データ

df <- structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L), V1 = c("sp0000001-mRNA-1", 
"sp0000001-mRNA-1", "sp0000001-mRNA-1", "sp0000005-mRNA-1", "sp0000006-mRNA-1", 
"sp0000006-mRNA-1", "sp0000006-mRNA-2", "sp0000006-mRNA-3"), 
    V14 = c("", "", "", "GO:0003723", "GO:0016021", "GO:0016021", 
    "GO:0016021", "GO:0016021")), class = "data.frame", .Names = c("id", 
"V1", "V14"), row.names = c(NA, -8L)) 


    id    V1  V14 
1 1 sp0000001-mRNA-1   
2 2 sp0000001-mRNA-1   
3 3 sp0000001-mRNA-1   
4 4 sp0000005-mRNA-1 GO:0003723 
5 5 sp0000006-mRNA-1 GO:0016021 
6 6 sp0000006-mRNA-1 GO:0016021 
7 7 sp0000006-mRNA-2 GO:0016021 
8 9 sp0000006-mRNA-3 GO:0016021

出典

2017-09-13 23:18:29 CPak

ブランク/欠損値を持つ一意の関数の使用方法

答えて

関連する問題