サンプルサイズを再サンプリングして増加させる入れ子になったループを書く方法

私はRで新しいループです。複数のネストループを書くのに助けが必要です。私は行が1つの領域内の1つのサイトからの種の数を表すデータフレームを持っています。 50の地域があり、地域間のサイト数は不平等です。各地域では、サイト数を増やすことに基づいて多様性指標を計算し、増分ステップごとに1000xを複製する必要があります。例えば：ここでサンプルサイズを再サンプリングして増加させる入れ子になったループを書く方法

R1 <- subset(df, region=="1") #this needs to be completed for all 50 regions 
R1$region<-NULL 

max<-nrow(R1)-1 

iter <- 1000 #the number of iterations 
n <- 1 # the number of rows to be sampled. This needs to increase until 
“max” 
outp <- rep(NA, iter) 

for (i in 1:iter){ 
    d <- sample(1:nrow(R1), size = n, replace=FALSE) 
    bootdata <- R1[d,] 
    x <- colSums(bootdata) #this is not applicable until n>1 
    outp[i] <- 1/diversity(x, index = "simpson") 
}

は、要するに、サンプルデータセット

structure(list(region = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Sp1 = c(31L, 
85L, 55L, 71L, 81L, 22L, 78L, 64L), Sp2 = c(10L, 84L, 32L, 86L, 
47L, 93L, 55L, 35L), Sp3 = c(86L, 56L, 4L, 8L, 55L, 47L, 51L, 
95L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L), .Names = c("region", "Sp1", "Sp2", "Sp3"), spec = structure(list(
cols = structure(list(region = structure(list(), class = 
c("collector_integer", 
"collector")), Sp1 = structure(list(), class = c("collector_integer", 
"collector")), Sp2 = structure(list(), class = c("collector_integer", 
"collector")), Sp3 = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("region", "Sp1", "Sp2", "Sp3")), 
default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

で、私は各サイトの「シンプソン」指数を計算する必要がある地域ごとに、ランダムに1000倍をリサンプリング。次に、各列が合計された後、1000回、2つのサイトのインデックスを再度計算する必要があります。最大3つのサイトなど

私も出力を書くのに苦労します。私は最大1000までの反復を表す列を持つ各領域のための1つのデータフレームを持つことを望んでいます。

多くのことに感謝します。

出典

2017-06-08 Jeremiah Plass-Johnson

小さな再現性のあるデータセットを提供してください。 https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example –

申し訳ありませんが、文字の制限に合わせて簡潔にデータセットを追加する方法を理解するのは本当に苦労しています –

@lmo元の投稿に追加しました。これは正しい形式です、はい？私はそれを基にしていますhttps://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example –

一度に一般的な領域で機能する関数を書くことができます。次に、地域ごとにデータをリストに分割し、sapplyを使用して各リスト要素にカスタム関数を適用します。

bootstrapByRegion <- function(R) { 
    rgn <- unique(R$region) 
    message(sprintf("Processing %s", rgn)) 
    R$region <- NULL 

    nmax <- nrow(R)-1 

    if (nmax == 0) stop(sprintf("Trying to work on one row. No dice. Manually exclude region %s or handle otherwise.", rgn)) 

    iter <- 1000 #the number of iterations 
    # pre-allocate the result 
    output <- matrix(NA, nrow = iter, ncol = nmax) 

    for (i in 1:nmax) { 
    i <- 1 
    output[, i] <- replicate(iter, expr = { 
     d <- sample(1:nrow(R), size = i, replace=FALSE) 
     bootdata <- R[d, , drop = FALSE] 
     x <- colSums(bootdata) #this is not applicable until n>1 
     outp <- 1/diversity(x, index = "simpson") 
     outp 
    }) 
    } 
    output 
} 

xy <- split(df, f = df$region) 
result <- sapply(xy, FUN = bootstrapByRegion) # list element is taken as R

領域3が1行しか有しているので、それは（理由nrow(R)-1の）動作しません。これらの領域は、さまざまな方法で除外できます。ここに1つあります。

result <- sapply(xy[sapply(xy, nrow) > 1], FUN = bootstrapByRegion)

出典

2017-06-08 12:32:59

多くのおかげでローマ、これはどのように地域にそれを分割するアドレスが、今私は必要nからmaxまで1000回の "sample"機能を実行します。 @RomanLuštrik –

@ JeremiahPlass-Johnson私の編集を参照してください。リージョンは1つしかないので、リージョン3は機能しません。ブートストラップ手順を実行する前に、関数内で何らかの方法でこれを処理するか、リージョンを除外する必要があります。 –

作品！多くのおかげで@RomanLuštrik –

サンプルサイズを再サンプリングして増加させる入れ子になったループを書く方法

答えて

関連する問題