rのif_elseでのNAの処理

次のデータセットには、日付を含む3つの列があります。rのif_elseでのNAの処理

私は、次のif_else文はほとんどそこに私を取得します

[395500]の間でT1またはT2と入社年月日の間に差がある場合DF1 $ COMが1であるような場合はelse文を実行したいと思います

library(dplyr) 

set.seed(45) 

df1 <- data.frame(hire_date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="week"), 10), 
       t1 = sample(seq(as.Date('2000/01/01'), as.Date('2001/01/01'), by="week"), 10), 
       t2 = sample(seq(as.Date('2000/01/01'), as.Date('2001/01/01'), by="day"), 10)) 

#this value is actually unknown 
df1[10,2] <- NA 

    hire_date   t1   t2 
1 1999-08-20 2000-05-13 2000-02-17 
2 1999-04-23 2000-11-11 2000-04-27 
3 1999-03-26 2000-04-15 2000-08-01 
4 1999-05-07 2000-06-03 2000-08-29 
5 1999-04-30 2000-05-27 2000-11-19 
6 1999-04-09 2000-12-30 2000-01-26 
7 1999-03-12 2000-12-23 2000-12-07 
8 1999-06-25 2000-02-12 2000-09-26 
9 1999-02-26 2000-05-06 2000-08-23 
10 1999-01-01  <NA> 2000-03-18

しかし、NAはそれをうんざりさせる。何か案は？

df1$com <- if_else((df1$t1 - df1$hire_date) >= 395 & 
       (df1$t1 - df1$hire_date) <= 500, 1, 
     if_else((df1$t2 - df1$hire_date) >= 395 & 
       (df1$t2 - df1$hire_date) <= 500, 1, 0))

出典

2017-02-06 afleishman

ここにはいくつかの「＆！is.na」文を追加することもできます。 – lmo

「NA」の扱い方は？ –

df1 $ com < - if_else（！is.na（df1 $ t1）＆（df1 $ t1-df1 $ hire_date）> = 395＆（df1 $ t1 - df1 $ hire_date）<= 500、1、 if_else（！is（df1 $ t2）＆（df1 $ t2-df1 $ hire_date）> = 395＆（df1 $ t2-df1 $ hire_date）<= 500,1,0）） ' ？ – Gopala

あなたはif_else文を入れ子にするのではなく、dplyr::case_whenを使用することができます。 NAの治療方法を簡単に制御できます。 dplyr::betweenは日付比較のためにも同様にクリーンアップされます。今

case_whenは、dplyrの開発バージョンでmutate()で動作0.5.0.9000、およびサポートパッケージbindrcpp。 devtools::install_github(c("hadley/dplyr", "krlmlr/bindrcpp"))でGitHubからインストールしてください。

df1 %>% 
    mutate(com = case_when(
    is.na(df1$t1) | is.na(df1$t2) ~ 999, # or however you want to treat NA cases 
    between(df1$t1 - df1$hire_date, 395, 500) ~ 1, 
    between(df1$t2 - df1$hire_date, 395, 500) ~ 1, 
    TRUE ~ 0 # neither range is between 395 and 500 
)) 

#>  hire_date   t1   t2 com 
#> 1 1999-08-20 2000-05-13 2000-02-17 0 
#> 2 1999-04-23 2000-11-11 2000-04-27 0 
#> 3 1999-03-26 2000-04-15 2000-08-01 1 
#> 4 1999-05-07 2000-06-03 2000-08-29 1 
#> 5 1999-04-30 2000-05-27 2000-11-19 0 
#> 6 1999-04-09 2000-12-30 2000-01-26 0 
#> 7 1999-03-12 2000-12-23 2000-12-07 0 
#> 8 1999-06-25 2000-02-12 2000-09-26 1 
#> 9 1999-02-26 2000-05-06 2000-08-23 1 
#> 10 1999-01-01  <NA> 2000-03-18 999

出典

2017-02-06 19:42:41

チップをお返事ありがとう：dplyr :: between。私はcase_whenを一度も使用しておらず、それを読み上げます。あなたの答えの問題は、どちらかが欠落しているかどうか気にしないことです。ちょうどt1またはt2とhire_dateの間の違いが395と500の間であることを確認したいと思います。あなたのコードはdf1 $ com = t1またはt2のいずれかが欠落している場合は999です。 – afleishman

私はあなたの記事から、NAで希望する振る舞いが不明であったことを知っています。その場合は、 'case.when'コマンドの' is.na（... '）で始まる最初の行を削除してください。 –

'is.na（...'それはt2を評価しない – afleishman

答えて

関連する問題