は少し不器用な実装であるが、それは動作します:
#Read the data:
data<-fread("case_id event eventDate countof_eventA_before_B finishDate
1 1 A 2000-07-25 NA NA
2 1 A 2014-02-25 NA NA
3 1 B 2014-07-07 2 2017-05-24
4 2 A 2000-03-12 NA NA
5 2 A 2000-06-06 NA NA
6 2 A 2000-09-05 NA NA
7 2 B 2015-12-16 3 2016-07-28
8 2 A 2016-07-28 NA NA
9 2 A 2017-03-03 NA NA
10 3 A 2002-05-13 NA NA
11 3 A 2002-06-12 NA NA
12 3 B 2004-06-27 2 2004-06-27
13 3 A 2004-07-11 NA NA
14 4 B 2011-08-31 0 2012-04-21
15 4 A 2012-04-21 NA NA
16 4 B 2013-01-10 1 2017-05-24 ")
data[,V1:=NULL]
setnames(data,c("case_id","event","eventDate","countof_eventA_before_B","finishDate"))
#Order data by case_id and eventDate:
data[,eventDatenum:=as.numeric(gsub("-","",eventDate))]
data<-data[order(case_id,eventDate)]
#Obtain all events by case_id:
data[,all_cases:=list(list(event)),by="case_id"]
data[,all_dates:=list(list(eventDate)),by="case_id"]
#Obtain order number of an event within case_id
data[,seq_nr:=seq_len(.N),by=c("case_id")]
#Define function calculating number of events:
function_seq<-function(x,y){
as.numeric(sum(x[[1]][1:(y-1)]=="A"))
}
#Obtain function calculating needed date:
function_date<-function(x,y,z){
x_aux<-x[[1]][(z+1):length(x[[1]])]
y_aux<-y[[1]][(z+1):length(x[[1]])]
if (sum(x_aux=="A",na.rm=TRUE)>0){
as.character(y_aux[x_aux=="A"][1])
} else{
as.character(Sys.Date())
}
}
data[,neeed_nr_events:=ifelse(event=="A",as.numeric(NA),function_seq(all_cases,seq_nr)),by=1:nrow(data)]
data[,neeed_dates:=ifelse(event=="A",as.character(NA),function_date(all_cases,all_dates,seq_nr)),by=1:nrow(data)]
data
case_id event eventDate countof_eventA_before_B finishDate eventDatenum all_cases all_dates seq_nr neeed_nr_events neeed_dates
1: 1 A 2000-07-25 NA NA 20000725 A,A,B 2000-07-25,2014-02-25,2014-07-07 1 NA NA
2: 1 A 2014-02-25 NA NA 20140225 A,A,B 2000-07-25,2014-02-25,2014-07-07 2 NA NA
3: 1 B 2014-07-07 2 2017-05-24 20140707 A,A,B 2000-07-25,2014-02-25,2014-07-07 3 2 2017-05-24
4: 2 A 2000-03-12 NA NA 20000312 A,A,A,B,A,A 2000-03-12,2000-06-06,2000-09-05,2015-12-16,2016-07-28,2017-03-03 1 NA NA
5: 2 A 2000-06-06 NA NA 20000606 A,A,A,B,A,A 2000-03-12,2000-06-06,2000-09-05,2015-12-16,2016-07-28,2017-03-03 2 NA NA
6: 2 A 2000-09-05 NA NA 20000905 A,A,A,B,A,A 2000-03-12,2000-06-06,2000-09-05,2015-12-16,2016-07-28,2017-03-03 3 NA NA
7: 2 B 2015-12-16 3 2016-07-28 20151216 A,A,A,B,A,A 2000-03-12,2000-06-06,2000-09-05,2015-12-16,2016-07-28,2017-03-03 4 3 2016-07-28
8: 2 A 2016-07-28 NA NA 20160728 A,A,A,B,A,A 2000-03-12,2000-06-06,2000-09-05,2015-12-16,2016-07-28,2017-03-03 5 NA NA
9: 2 A 2017-03-03 NA NA 20170303 A,A,A,B,A,A 2000-03-12,2000-06-06,2000-09-05,2015-12-16,2016-07-28,2017-03-03 6 NA NA
10: 3 A 2002-05-13 NA NA 20020513 A,A,B,A 2002-05-13,2002-06-12,2004-06-27,2004-07-11 1 NA NA
11: 3 A 2002-06-12 NA NA 20020612 A,A,B,A 2002-05-13,2002-06-12,2004-06-27,2004-07-11 2 NA NA
12: 3 B 2004-06-27 2 2004-06-27 20040627 A,A,B,A 2002-05-13,2002-06-12,2004-06-27,2004-07-11 3 2 2004-07-11
13: 3 A 2004-07-11 NA NA 20040711 A,A,B,A 2002-05-13,2002-06-12,2004-06-27,2004-07-11 4 NA NA
14: 4 B 2011-08-31 0 2012-04-21 20110831 B,A,B 2011-08-31,2012-04-21,2013-01-10 1 0 2012-04-21
15: 4 A 2012-04-21 NA NA 20120421 B,A,B 2011-08-31,2012-04-21,2013-01-10 2 NA NA
16: 4 B 2013-01-10 1 2017-05-24 20130110 B,A,B 2011-08-31,2012-04-21,2013-01-10 3 1 2017-05-24
そしてここでは、1行のソリューションです:
data[,c("all_cases","all_dates","seq_nr"):=list(list(event),list(eventDate),seq_len(.N)),by=c("case_id")][,c("needed_nr_events","neeed_dates"):=list(ifelse(event=="A",as.numeric(NA),function_seq(all_cases,seq_nr)),
ifelse(event=="A",as.character(NA),function_date(all_cases,all_dates,seq_nr))),
by=1:nrow(data)]
読める – rgunning
マルコを作るためにテーブルを再フォーマットしてください、編集をありがとう。 – rlearner