Rのデータフレームの両端にある連続したゼロを置換する

データフレームの最初と最後の列のゼロをNAに置き換える必要がありますが、最初/最後のゼロを置き換えるときは、その特定の行に存在する連続するゼロ。例のデータフレームを考える：Rのデータフレームの両端にある連続したゼロを置換する

df <- data.frame(a = c(1,0,1,0,1,1,1,0,1,1,1), 
       b = c(1,1,1,0,1,1,1,0,1,1,1), 
       c = c(1,0,1,1,1,0,1,0,1,1,1), 
       d = c(1,1,1,0,1,1,1,1,1,1,1), 
       e = c(1,0,1,0,1,1,1,1,1,1,1), 
       f = c(1,1,1,1,1,1,1,1,1,0,1)) 
df

私はそれを返す必要があります。事前に

df.result <- data.frame(a = c(1,NA,1,NA,1,1,1,NA,1,1,1), 
         b = c(1,1,1,NA,1,1,1,NA,1,1,1), 
         c = c(1,0,1,1,1,0,1,NA,1,1,1), 
         d = c(1,1,1,0,1,1,1,1,1,1,1), 
         e = c(1,0,1,0,1,1,1,1,1,1,1), 
         f = c(1,1,1,1,1,1,1,1,1,NA,1)) 
df.result

感謝を。

all.equal(df, df.result) 
#[1] TRUE

：

idx <- t(apply(df != 0, 1, function(x) cumsum(x) == 0 | rev(cumsum(rev(x)) == 0))) 
df[idx] <- NA

結果は、ご希望の出力に等しい：

出典

2017-05-11 Ross

もう一つの方法は、これapply回避し、行上で動作：

g<-lapply(df,"==",0) 
df[do.call(cbind,Reduce("&",g,accumulate=TRUE)) | do.call(cbind,Reduce("&",g,accumulate=TRUE,right=TRUE))]<-NA 
identical(df,df.result) 
#[1] TRUE

迅速なベンチマーク：

docendo<-function(df) { 
    idx <- t(apply(df != 0, 1, function(x) cumsum(x) == 0 | rev(cumsum(rev(x)) == 0))) 
    df[idx] <- NA 
    df 
} 

nicola<-function(df) { 
    g<-lapply(df,"==",0) 
    df[do.call(cbind,Reduce("&",g,accumulate=TRUE)) | do.call(cbind,Reduce("&",g,accumulate=TRUE,right=TRUE))]<-NA 
    df 
} 

lmo<-function(df) { 
    reps.first <- max.col(df, ties.method = "first") - 1 
    reps.last <- max.col(df, ties.method = "last") 
    fill.last <- length(df)-reps.last 
    is.na(df[cbind(rep(seq_len(nrow(df))[reps.first > 0], reps.first[reps.first > 0]), 
       sequence(reps.first))]) <- TRUE 
    is.na(df[cbind(rep(seq_len(nrow(df))[fill.last > 0], fill.last[fill.last > 0]), 
       length(df)-(sequence(fill.last) - 1))]) <- TRUE 
    df 
} 
#create a bigger dataset 
df<-df[rep(1:nrow(df),each=10000),] 
system.time(res<-docendo(df)) 
# user system elapsed 
# 2.088 0.020 2.145 
system.time(res2<-nicola(df)) 
# user system elapsed 
# 0.016 0.000 0.017 
identical(res,res2) 
#[1] TRUE 
system.time(res3<-lmo(df)) 
# user system elapsed 
# 0.222 0.000 0.265 
identical(res2,res3) 
#[1] TRUE

出典

2017-05-11 14:55:42 nicola

ありがとう！すべてのオプションがうまくいきましたが、私はそのスピードのためにこの答えをマークしました。 – Ross

我々は最初のNAのデータをサブセット化して割り当てるために使用される論理行列を構築する次の解決方法を試してみてくださいあなたがパフォーマンス/メモリを心配している場合は、最初の列と最後の列の行を0で計算し、2番目の手順はthosだけにするという2つの方法でこれを行うこともできます行。サイドノートとして

idx1 <- rowSums(df[,c(1, ncol(df))] == 0)>0 
idx2 <- t(apply(df[idx1,] != 0, 1, function(x) cumsum(x) == 0 | rev(cumsum(rev(x)) == 0))) 
df[idx1,][idx2] <- NA

あなたは（私はインデックスを作成することを好むが）以下を使用している場合、あなたはまた、インデックスを作成するための中間ステップをスキップすることができます

is.na(df) <- t(apply(df != 0, 1, function(x) cumsum(x) == 0 | rev(cumsum(rev(x)) == 0)))

出典

2017-05-11 14:34:10

ありがとうございました！ – Ross

ここでは、埋めるために、各行の要素を識別するmax.colを使用した後のNAを埋めるために、マトリックスのサブセットとis.na<-を使用する他の基地R法です。マトリックスは、repとsequenceを使用して塗りつぶします。

# get the last of the 0 values from first column 
reps.first <- max.col(df, ties.method = "first") - 1 
# get the last of the 0 values starting with last column 
reps.last <- max.col(df, ties.method = "last") 
fill.last <- length(df)-reps.last 

# fill in from first column 
is.na(df[cbind(rep(seq_len(nrow(df))[reps.first > 0], reps.first[reps.first > 0]), 
       sequence(reps.first))]) <- TRUE 
# fill in from last column 
is.na(df[cbind(rep(seq_len(nrow(df))[fill.last > 0], fill.last[fill.last > 0]), 
       length(df)-(sequence(fill.last) - 1))]) <- TRUE 

all.equal(df, df.result) 
[1] TRUE

出典

2017-05-11 15:05:05 lmo

私は自分の答えをベンチマークに含めて編集しました。 – nicola

ありがとうございました！ – Ross

Rのデータフレームの両端にある連続したゼロを置換する

答えて

関連する問題