パンダ - 複数の変数をグループ化してアンスタックする方法

私は現在、以下のように構成されているいくつかのデータセットを持っています、今、私は、各行が繰り返し変数のそれぞれをカウントするようにデータセットを集約したいパンダ - 複数の変数をグループ化してアンスタックする方法

+----+---------------+-------------+----------------+----------+---------+ 
| | participant | step_name | title   | colour | class | 
|----+---------------+-------------+----------------+----------+---------| 
| 0 |   100 | first  | acceptable  | blue  | A  | 
| 1 |   101 | first  | acceptable  | blue  | B  | 
| 2 |   102 | second  | not acceptable | blue  | B  | 
| 3 |   103 | third  | acceptable  | green | A  | 
| 4 |   104 | second  | not acceptable | green | B  | 
| 5 |   105 | first  | acceptable  | blue  | A  | 
| 6 |   106 | first  | not acceptable | green | A  | 
| 7 |   107 | first  | acceptable  | blue  | A  | 
| 8 |   108 | second  | acceptable  | blue  | A  | 
| 9 |   109 | third  | acceptable  | green | B  | 
+----+---------------+-------------+----------------+----------+---------+

：のように見えた

data = {'participant': [100, 101, 102, 103, 104, 105, 106, 107, 108, 109], 
     'step_name': ['first', 'first', 'second', 'third', 'second', 'first', 'first', 'first', 'second', 'third'], 
     'title': ['acceptable', 'acceptable', 'not acceptable', 'acceptable', 'not acceptable', 'acceptable', 'not acceptable', 'acceptable', 'acceptable', 'acceptable'], 
     'colour': ['blue', 'blue', 'blue', 'green', 'green', 'blue', 'green', 'blue', 'blue', 'green'], 
     'class': ['A', 'B', 'B', 'A', 'B', 'A', 'A', 'A', 'A', 'B']} 
df = pd.DataFrame(data, columns=['participant', 'step_name', 'title', 'colour', 'class'])

私は現在、次のように2つの変数（step_nameとtitle）に沿って行うことができた：

count_df = df[['participant', 'step_name', 'title']].groupby(['step_name', 'title']).count() 
count_df = count_df.unstack() 
count_df.fillna(0, inplace=True) 
count_df.columns = count_df.columns.get_level_values(1) 
count_df 

+--------+--------------+------------------+ 
|  | acceptable | not acceptable | 
|--------+--------------+------------------| 
| first |   4 |    1 | 
| second |   1 |    2 | 
| third |   2 |    0 | 
+--------+--------------+------------------+

しかし、私は他の変数（colourとclass）の値を含む余分な列のセットを持っていたいと思います - 基本的に、それらの変数をグループ化してからスタックしたいのですが、それには2つ以上の変数があります。最終的に、私はこのように私の最後のテーブルに対してたい：それは私の最後の例のように見えるように

+------+------+--------+--------------+------------------+ 
|class |colour| step | acceptable | not acceptable | 
|----------------------+--------------+------------------| 
| A | blue | first |   3 |    0 | 
| B | blue | first |   1 |    0 | 
| A |green | first |   0 |    1 | 
| B |green | first |   0 |    0 | 
| A | blue | second |   1 |    0 | 
| B | blue | second |   0 |    1 | 
| A |green | second |   0 |    0 | 
| B |green | second |   0 |    1 | 
| A |blue | third |   0 |    0 | 
| B |blue | third |   0 |    0 | 
| A |green | third |   1 |    0 | 
| B |green | third |   1 |    0 | 
+------+------+--------+--------------+------------------+

どのように私は自分のデータを再構築しますか？私はまだスタックとグループの機能を使用していますか？

出典

2016-05-09 orange1

私はあなたがaggfunc=lenとpivot_table、reset_indexとrename_axis（pandas0.18.0で新しいの）必要だと思う：

df = df.pivot_table(index=['class','colour','step_name'], 
        columns='title', 
        aggfunc=len, 
        values='participant', 
        fill_value=0).reset_index().rename_axis(None, axis=1) 
print df 
     class colour step_name acceptable not acceptable 
0   A blue  first   3    0 
1   A blue second   1    0 
2   A green  first   0    1 
3   A green  third   1    0 
4   B blue  first   1    0 
5   B blue second   0    1 
6   B green second   0    1 
7   B green  third   1    0

出典

2016-05-09 17:22:46 jezrael

ありがとう！ 'rename_axis'ビットが私にエラーを与えるように見えます - ' TypeError：名前を変更するためのインデックスを渡す必要があります ' – orange1

'pandas 0.18.0'の新機能ですが、省略することができます。 'df.columns.name = None'を使用してください。 – jezrael

あなたは、このためにpivot_table()を使用することができます。

In [130]: df['count'] = 1 

In [134]: (df.pivot_table(index=['class','colour','step_name'], columns='title', 
    .....:     values='count', aggfunc='sum', fill_value=0) 
    .....: .reset_index() 
    .....:) 
Out[134]: 
title class colour step_name acceptable not acceptable 
0   A blue  first   3    0 
1   A blue second   1    0 
2   A green  first   0    1 
3   A green  third   1    0 
4   B blue  first   1    0 
5   B blue second   0    1 
6   B green second   0    1 
7   B green  third   1    0

出典

2016-05-09 17:10:48 MaxU

パンダ - 複数の変数をグループ化してアンスタックする方法

答えて

関連する問題