2017-10-20 5 views
1

私の週は、ISOの週が、財政週間(1、7月の第1週で52年6月の最後の週である)でない場合、次のデータフレームを持っている:私はしたいパンダ - pivot_table秩序を維持することは失敗しながら

> df 
    domain week count 
0  A 43  5 
1  A 45  1 
2  A 50  1 
3  A 51  4 
4  A  1  3 
5  A  3  12 
6  B 43  1 
7  B 44  1 
8  B 45  4 
9  B 50  11 
10  B  2  3 
11  B  3  12 
12  C 51  6 
13  C  1  14 
14  C  5  1 

値は、カウントされ、列がドメインであることを次のようになり、新たなデータフレームを取得する週の順序を維持しながら、このテーブルを旋回させる:

> new_df 
week A  B  C 
43  5  1 NaN 
44 NaN  1 NaN 
45  1  4 NaN  
50  1 11 NaN 
51  4 NaN  6 
1  3 NaN 14 
2  NaN  3 NaN 
3  12 12 NaN 
5  NaN NaN  1 

を私はグルーピーを使用してみましたし、剥がれはなくなりましたこのエラー:

> df = df.groupby(['week'], sort=False)['count'].unstack('domain') 
AttributeError: Cannot access callable attribute 'unstack' of 'SeriesGroupBy' objects, try using the 'apply' method 

答えて

1

オプション1]カスタム注文weeksインデックスヘルパーと.loc

In [4810]: weeks = pd.Index(list(range(26, 52)) + list(range(26))) 

In [4819]: dfp = df.groupby(['week','domain'])['count'].sum().unstack() 

In [4820]: dfp.loc[weeks & dfp.index] 
Out[4820]: 
domain  A  B  C 
43  5.0 1.0 NaN 
44  NaN 1.0 NaN 
45  1.0 4.0 NaN 
50  1.0 11.0 NaN 
51  4.0 NaN 6.0 
1  3.0 NaN 14.0 
2  NaN 3.0 NaN 
3  12.0 12.0 NaN 
5  NaN NaN 1.0 
を使用することができます

オプション2]pivot

In [4821]: dfp = df.pivot('week', 'domain', 'count') 

In [4822]: dfp.loc[weeks & dfp.index] 
Out[4822]: 
domain  A  B  C 
43  5.0 1.0 NaN 
44  NaN 1.0 NaN 
45  1.0 4.0 NaN 
50  1.0 11.0 NaN 
51  4.0 NaN 6.0 
1  3.0 NaN 14.0 
2  NaN 3.0 NaN 
3  12.0 12.0 NaN 
5  NaN NaN 1.0 

オプション3]あるいは、reindex代わりに.loc

In [4830]: dfp.reindex(weeks & dfp.index) 
Out[4830]: 
domain  A  B  C 
43  5.0 1.0 NaN 
44  NaN 1.0 NaN 
45  1.0 4.0 NaN 
50  1.0 11.0 NaN 
51  4.0 NaN 6.0 
1  3.0 NaN 14.0 
2  NaN 3.0 NaN 
3  12.0 12.0 NaN 
5  NaN NaN 1.0 

詳細

In [4826]: weeks 
Out[4826]: 
Int64Index([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 
      43, 44, 45, 46, 47, 48, 49, 50, 51, 0, 1, 2, 3, 4, 5, 6, 7, 
      8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
      25], 
      dtype='int64') 

In [4827]: weeks & dfp.index 
Out[4827]: Int64Index([43, 44, 45, 50, 51, 1, 2, 3, 5], dtype='int64') 
0

あなたはweek Sのカスタムオーダーが必要なので、カスタムオーダーでordered categoricalを必要とsort=Falseを省略:

cats = list(range(26, 52)) + list(range(26)) 
print (cats) 
[26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25] 

df['week'] = df['week'].astype('category', ordered=True, categories=cats) 

df = df.groupby(['week','domain'])['count'].sum().unstack() 
print (df) 
domain  A  B  C 
week      
43  5.0 1.0 NaN 
44  NaN 1.0 NaN 
45  1.0 4.0 NaN 
50  1.0 11.0 NaN 
51  4.0 NaN 6.0 
1  3.0 NaN 14.0 
2  NaN 3.0 NaN 
3  12.0 12.0 NaN 
5  NaN NaN 1.0 
+0

問題は、44週2は見当違いであること週です。 44週目は43〜45になり、2週目は1〜3になるはずです。 –

+0

御注文は[26,27 ...、51,0,1、..、25]ですか? – jezrael

関連する問題