Row_number（）とFun - Redshift Postgres - タイムシーケンスと再開ナンバリング

私は自分のデータ内のスジを探していますが、少なくとも3つの連続したスジがnpによってどこにフラグが立てられているのかを見つけることです。ここで Row_number（）とFun - Redshift Postgres - タイムシーケンスと再開ナンバリング

は私のデータのサブセットです：

drop table if exists bi_test; 
create table test (id varchar(12),rd date,np decimal); 

insert into test 
select 'aaabbbccc', '2016-07-25'::date, 0 union all 
select 'aaabbbccc', '2016-08-01'::date, 0 union all 
select 'aaabbbccc', '2016-08-08'::date, 0 union all 
select 'aaabbbccc', '2016-08-15'::date, 0 union all 
select 'aaabbbccc', '2016-08-22'::date, 1 union all 
select 'aaabbbccc', '2016-08-29'::date, 0 union all 
select 'aaabbbccc', '2016-09-05'::date, 1 union all 
select 'aaabbbccc', '2016-09-12'::date, 0 union all 
select 'aaabbbccc', '2016-09-19'::date, 1;

私はROW_NUMBER（）を使用して）（カウントするように期待していますが、それは私が望む結果を与えていないようです。ここで

select 
    * 
    ,row_number() over (partition by t.id order by t.rd) all_ctr 
    ,count(t.id) over (partition by t.id) all_count 
    ,row_number() over (partition by t.id,t.np order by t.rd) np_counter 
    ,count(t.id) over (partition by t.id,t.np) np_non_np 
from 
    bi_adhoc.test t 
order by 
    t.rd;

私の結果であり、かつ望ましい結果：これを行うには

id   rd    np all_ctr all_count np_counter np_non_np **Desired** 
aaabbbccc 7/25/2016  0  1   9   1   6   **1** 
aaabbbccc 8/1/2016   0  2   9   2   6   **2** 
aaabbbccc 8/8/2016   0  3   9   3   6   **3** 
aaabbbccc 8/15/2016  0  4   9   4   6   **4** 
aaabbbccc 8/22/2016  1  5   9   1   3   **1** 
aaabbbccc 8/29/2016  0  6   9   5   6   **1** 
aaabbbccc 9/5/2016   1  7   9   2   3   **1** 
aaabbbccc 9/12/2016  0  8   9   6   6   **1** 
aaabbbccc 9/19/2016  1  9   9   3   3   **1**

出典

2016-09-21 Josh

どこnp_flag列があり、それがどのような値を持っているのでしょうか？ –

私は謝罪し、np_flagをnpに編集してテーブルを読みやすくしました。 npはバイナリです。 – Josh

まず、私は...完全に問題を説明するのを助けるために

drop table if exists bi_adhoc.test; 
create table bi_adhoc.test (id varchar(12),period date,hit decimal); 

insert into bi_adhoc.test 
select 'aaabbbccc', '2016-07-25'::date, 0 union all 
select 'aaabbbccc', '2016-08-01'::date, 0 union all 
select 'aaabbbccc', '2016-08-08'::date, 0 union all 
select 'aaabbbccc', '2016-08-15'::date, 1 union all 
select 'aaabbbccc', '2016-08-22'::date, 1 union all 
select 'aaabbbccc', '2016-08-29'::date, 0 union all 
select 'aaabbbccc', '2016-09-05'::date, 0 union all 
select 'aaabbbccc', '2016-09-12'::date, 1 union all 
select 'aaabbbccc', '2016-09-19'::date, 0 union all 
select 'aaabbbccc', '2016-09-26'::date, 1 union all 
select 'aaabbbccc', '2016-10-03'::date, 1 union all 
select 'aaabbbccc', '2016-10-10'::date, 1 union all 
select 'aaabbbccc', '2016-10-17'::date, 1 union all 
select 'aaabbbccc', '2016-10-24'::date, 1 union all 
select 'aaabbbccc', '2016-10-31'::date, 0 union all 
select 'aaabbbccc', '2016-11-07'::date, 0 union all 
select 'aaabbbccc', '2016-11-14'::date, 0 union all 
select 'aaabbbccc', '2016-11-21'::date, 0 union all 
select 'aaabbbccc', '2016-11-28'::date, 0 union all 
select 'aaabbbccc', '2016-12-05'::date, 1 union all 
select 'aaabbbccc', '2016-12-12'::date, 1;

をいくつかの追加データを追加した後、キーはストリークが何であったかを把握することだったとどのように私ができるように、各連勝を特定する方法データを分割して、データを分割するものがあるようにします。

select 
    * 
    ,case 
     when t1.hit = 1 then row_number() over (partition by t1.id,t1.hit_partition order by t1.period) 
     when t1.hit = 0 then row_number() over (partition by t1.id,t1.miss_partition order by t1.period) 
    else null 
end desired 
from 
(
select 
    * 
    ,row_number() over (partition by t.id order by t.id,t.period) 
    ,case 
     when t.hit = 1 then row_number() over (partition by t.id, t.hit order by t.period) 
     else null 
    end hit_counter 
    ,case 
     when t.hit = 1 then row_number() over (partition by t.id order by t.id,t.period) - row_number() over (partition by t.id, t.hit order by t.period) 
     else null 
    end hit_partition 
    ,case 
     when t.hit = 0 then row_number() over (partition by t.id, t.hit order by t.period) 
     else null 
    end miss_counter 
    ,case 
     when t.hit = 0 then row_number() over (partition by t.id order by t.id,t.period) - row_number() over (partition by t.id, t.hit order by t.period) 
     else null 
    end miss_partition 
from 
    bi_adhoc.test t 
) t1 
order by 
    t1.id 
    ,t1.period;

この結果：

id   period   hit  row_number hit_counter hit_partition miss_counter miss_partition desired 
aaabbbccc 2016-07-25  0  1   NULL  NULL   1    0    1 
aaabbbccc 2016-08-01  0  2   NULL  NULL   2    0    2 
aaabbbccc 2016-08-08  0  3   NULL  NULL   3    0    3 
aaabbbccc 2016-08-15  1  4   1   3    NULL   NULL   1 
aaabbbccc 2016-08-22  1  5   2   3    NULL   NULL   2 
aaabbbccc 2016-08-29  0  6   NULL  NULL   4    2    1 
aaabbbccc 2016-09-05  0  7   NULL  NULL   5    2    2 
aaabbbccc 2016-09-12  1  8   3   5    NULL   NULL   1 
aaabbbccc 2016-09-19  0  9   NULL  NULL   6    3    1 
aaabbbccc 2016-09-26  1  10   4   6    NULL   NULL   1 
aaabbbccc 2016-10-03  1  11   5   6    NULL   NULL   2 
aaabbbccc 2016-10-10  1  12   6   6    NULL   NULL   3 
aaabbbccc 2016-10-17  1  13   7   6    NULL   NULL   4 
aaabbbccc 2016-10-24  1  14   8   6    NULL   NULL   5 
aaabbbccc 2016-10-31  0  15   NULL  NULL   7    8    1 
aaabbbccc 2016-11-07  0  16   NULL  NULL   8    8    2 
aaabbbccc 2016-11-14  0  17   NULL  NULL   9    8    3 
aaabbbccc 2016-11-21  0  18   NULL  NULL   10    8    4 
aaabbbccc 2016-11-28  0  19   NULL  NULL   11    8    5 
aaabbbccc 2016-12-05  1  20   9   11    NULL   NULL   1 
aaabbbccc 2016-12-12  1  21   10   11    NULL   NULL   2

出典

2016-09-23 20:11:14 Josh

一つの方法は、CTEの遅れ（NP）値を計算して、現在のNPを比較することであろうと、ストリークを検出するためにnpに遅れた。これは最適な方法ではないかもしれませんが、うまくいくようです。

with source_cte as 
(
select 
    * 
    ,row_number() over (partition by t.id order by t.rd) row_num 
    ,lag(np,1) over (partition by t.id order by t.rd) as prev_np 
from 
    bi_adhoc.test t 
) 
, streak_cte as 
(
select 
    *, 
    case when np=prev_np or prev_np is NULL then 1 else 0 end as is_streak 
from 
    source_cte 
) 
select 
    *, 
    case when is_streak=1 then dense_rank() over (partition by id, is_streak order by rd) else 1 end as desired 
from 
    streak_cte 
order by 
    rd;

出典

2016-09-21 21:27:06 DotThoughts

ありがとう、これは私が探していたものです。 – Josh

残念ながら、これはデータセット全体にこれを適用した後に必要なものではありませんでした。列の番号に番号を付けることを望んでいた...私は解決策を思いついた： – Josh

Row_number（）とFun - Redshift Postgres - タイムシーケンスと再開ナンバリング

答えて

関連する問題