ステップ1:マトリックス列ピボット
mat <- mat[, order(colnames(mat))]
# a1::A a1::B a1::C b1::A b1::D
# [1,] 1 0 0 1 0
# [2,] 1 0 0 0 1
# [3,] 1 1 1 1 1
# [4,] 1 0 1 0 0
# [5,] 0 1 0 0 1
ステップ2.1:カラム名分解
## decompose levels, get main levels (before ::) and sub levels (post ::)
decom <- strsplit(colnames(mat), "::")
main_levels <- sapply(decom, "[", 1)
# [1] "a1" "a1" "a1" "b1" "b1"
sub_levels <- sapply(decom, "[", 2)
# [1] "A" "B" "C" "A" "D"
ステップ2.2:グループ化インデックス生成
## generating grouping index
main_index <- paste(rep(main_levels, each = nrow(mat)), rep(1:nrow(mat), times = ncol(mat)), sep = "#")
sub_index <- rep(sub_levels, each = nrow(mat))
sub_index[!as.logical(mat)] <- "" ## 0 values in mat implies ""
## in unclear of what "main_index" and "sub_index" are, check:
## matrix(main_index, nrow(mat))
# [,1] [,2] [,3] [,4] [,5]
# [1,] "a1#1" "a1#1" "a1#1" "b1#1" "b1#1"
# [2,] "a1#2" "a1#2" "a1#2" "b1#2" "b1#2"
# [3,] "a1#3" "a1#3" "a1#3" "b1#3" "b1#3"
# [4,] "a1#4" "a1#4" "a1#4" "b1#4" "b1#4"
# [5,] "a1#5" "a1#5" "a1#5" "b1#5" "b1#5"
## matrix(sub_index, nrow(mat))
# [,1] [,2] [,3] [,4] [,5]
# [1,] "A" "" "" "A" ""
# [2,] "A" "" "" "" "D"
# [3,] "A" "B" "C" "A" "D"
# [4,] "A" "" "C" "" ""
# [5,] "" "B" "" "" "D"
ステップ2.3:私はこれで非常に満足していないが、代替を見つけることができませんでした後処理
:条件付き
## collapsed paste of "sub_index" conditional on "main_index"
x <- unname(tapply(sub_index, main_index, paste0, collapse = ""))
x[x == ""] <- "W"
# [1] "A" "A" "ABC" "AC" "B" "A" "D" "AD" "W" "D"
ステップ3を貼り付ける崩壊しました。
x <- sapply(strsplit(x, ""), paste0, collapse = ",")
# [1] "A" "A" "A,B,C" "A,C" "B" "A" "D" "A,D" "W" "D"
ステップ4:マトリックス
x <- matrix(x, nrow = nrow(mat))
colnames(x) <- unique(main_levels)
# a1 b1
# [1,] "A" "A"
# [2,] "A" "D"
# [3,] "A,B,C" "A,D"
# [4,] "A,C" "W"
# [5,] "B" "D"
効率考慮
方法自体は、ベクトル化を使用してかなり効率的であり、情報をグループ化する手動入力を必要としません。たとえば、数百ものメイングループ(::)と数百のサブグループ(投稿::)がある場合でも、同じコードを使用できます。
唯一の考慮事項は、不要なメモリコピーを減らすことです。この点で、上で説明したような明示的な行列の割り当てをせずに、できる限り、無名関数を使うべきです。これは良い(すでにテスト済み)でしょう:
decom <- strsplit(sort(colnames(mat)), "::")
main_levels <- sapply(decom, "[", 1)
sub_index <- rep(sapply(decom, "[", 2), each = nrow(mat))
sub_index[!as.logical(mat[, order(colnames(mat))])] <- ""
x <- unname(tapply(sub_index,
paste(rep(main_levels, each = nrow(mat)),
rep(1:nrow(mat), times = ncol(mat)),
sep = "#"),
paste0, collapse = ""))
x <- matrix(sapply(strsplit(x, ""), paste0, collapse = ","),
nrow = nrow(mat))
colnames(x) <- unique(main_levels)