サブシーケンス(行グループ) を持つデータフレームがあり、これらのサブシーケンスを識別する条件は列diffのサージを監視することです。これは、データがどのように見えるかです:ロールアップ関数を使用した条件付きグループ識別子の追加
> dput(test)
structure(list(vid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = "2a38ebc2-dd97-43c8-9726-59c247854df5", class = "factor"),
events = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L,
2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L), .Label = c("click",
"mousedown", "mousemove", "mouseup"), class = "factor"),
deltas = structure(6:25, .Label = c("154875", "154878", "154880",
"155866", "155870", "38479", "38488", "38492", "38775", "45595",
"45602", "45606", "45987", "50280", "50285", "50288", "50646",
"54995", "55001", "55005", "55317", "59528", "59533", "59537",
"59921", "63392", "63403", "63408", "63822", "66706", "66710",
"66716", "67002", "73750", "73755", "73759", "74158", "77999",
"78003", "78006", "78076", "81360", "81367", "81371", "82381",
"93365", "93370", "93374", "93872"), class = "factor"),
serial = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20), diff = c(0, 9, 4, 283, 6820, 7, 4, 381, 4293, 5, 3, 358, 4349, 6, 4,
312, 4211, 5, 4, 384)),
.Names = c("vid", "events", "deltas", "serial", "diff"),
row.names = c(NA, 20L), class = "data.frame")
私は新しいサブシーケンスが識別された場合を示していると全体のサブシーケンス一意のIDを割り当てます列を追加しようとしています。次の例でグループ化の基準を示します。
行5のdiff値は6829で、その行(283)までの最大値の10倍です。 CUMSUM(DFの$ diffを> 500)+ 1( - 方法についてのdf $グループ<:
structure(list(vid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = "2a38ebc2-dd97-43c8-9726-59c247854df5", class = "factor"),
events = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L,
2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L), .Label = c("click",
"mousedown", "mousemove", "mouseup"), class = "factor"),
deltas = structure(6:25, .Label = c("154875", "154878", "154880",
"155866", "155870", "38479", "38488", "38492", "38775", "45595",
"45602", "45606", "45987", "50280", "50285", "50288", "50646",
"54995", "55001", "55005", "55317", "59528", "59533", "59537",
"59921", "63392", "63403", "63408", "63822", "66706", "66710",
"66716", "67002", "73750", "73755", "73759", "74158", "77999",
"78003", "78006", "78076", "81360", "81367", "81371", "82381",
"93365", "93370", "93374", "93872"), class = "factor"), serial = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20),
diff = c(0, 9, 4, 283, 6820, 7, 4, 381, 4293, 5,
3, 358, 4349, 6, 4, 312, 4211, 5, 4, 384),
group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5)),
.Names = c("vid", "events", "deltas", "serial", "diff", "group"),
row.names = c(NA, 20L), class = "data.frame")
任意の助けも大歓迎
df $ group < - cumsum(df $ diff> 500)+ 1'(指定した基準どおりです)はどうですか? – Gopala
それは動作します!しかし、私は理由を理解していません:-) cumsumはRプロセスの行がさらにdfを下回るほど大きくなっていますか?どのようにこの作品が表示されませんが、それはなかった –