2017-08-07 8 views
1

のための1をn型計算:グループデータとこれは私のデータフレームである期間

dput(test) 
structure(list(Branch = c("11 Oktomvri", "11 Oktomvri", "11 Oktomvri", 
"11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "Aerodrom", 
"Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", 
"Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2", 
"Aerodrom 2", "Aerodrom 2", "Bitola", "Bitola", "Bitola", "Bitola", 
"Bitola", "Bitola", "Bitola"), period = c("January", "February", 
"March", "April", "May", "June", "July", "January", "February", 
"March", "April", "May", "June", "July", "January", "February", 
"March", "April", "May", "June", "July", "January", "February", 
"March", "April", "May", "June", "July"), value = c(1513, 1511, 
1520, 1524, 1508, 1504, 1517, 1364, 1381, 1400, 1403, 1401, 1406, 
1430, 674, 687, 689, 690, 696, 705, 715, 4400, 4393, 4365, 4342, 
4345, 4373, 4389)), .Names = c("Branch", "period", "value"), row.names = c(NA, 
-28L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = c("Branch", 
"period"), drop = TRUE, indices = list(3L, 1L, 0L, 6L, 5L, 2L, 
    4L, 10L, 8L, 7L, 13L, 12L, 9L, 11L, 17L, 15L, 14L, 20L, 19L, 
    16L, 18L, 24L, 22L, 21L, 27L, 26L, 23L, 25L), group_sizes = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
    Branch = c("11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "11 Oktomvri", 
    "11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "Aerodrom", 
    "Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", 
    "Aerodrom", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2", 
    "Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Bitola", "Bitola", 
    "Bitola", "Bitola", "Bitola", "Bitola", "Bitola"), period = c("April", 
    "February", "January", "July", "June", "March", "May", "April", 
    "February", "January", "July", "June", "March", "May", "April", 
    "February", "January", "July", "June", "March", "May", "April", 
    "February", "January", "July", "June", "March", "May")), row.names = c(NA, 
-28L), class = "data.frame", vars = c("Branch", "period"), drop = TRUE, .Names = c("Branch", 
"period"))) 

私はどのようにグループ支店と期間に基づいてデータをするかわからない、とperiod_nに基づいて値を計算する - period_n-1 。

出力は次のようになります。

city  period value diff_n_1 
Bitola March  4365 -28 
Bitola April  2000 13 

マイattepmt:

results <- sample2 %>% 
    group_by(Branch, period) %>% 
    arrange(Branch) %>% 
    mutate(lagged_period = lag(value), client_diff = value - lagged_period) 

私はプルオフするかどうかは、最後の行をわかりません。

アイデア?

答えて

2

あなたはほぼ正しいと思っていますが、Periodgroup_byには必要ありません。それは実際にはlagを使用している変数です。現在のところ、定義された各グループに要素が1つしかないため(遅延がないため)上記の作業ではNAsが排他的に発生します。

これは動作するはずです:

library(dplyr) 

sample2 %>% 
    group_by(Branch) %>% 
    arrange(Branch) %>% 
    mutate(lagged_period = lag(value), 
     client_diff = value - lagged_period) 

その後、あなたは計算フィールドからNAsを削除したい場合は、もちろん、あなただけのパイプ上の缶に:

filter(!is.na(client_diff)) 
+0

はあなたのデイブをありがとう! – Prometheus

関連する問題