パンダタイムグルーパーとピボット？

これは私のデータフレームは、次のようになります。これは、（タイムスタンプがグループ化されている注意してください）私はパンダでやろうとしているものですパンダタイムグルーパーとピボット？

Timestamp    CAT 
0 2016-12-02 23:35:28  200 
1 2016-12-02 23:37:43  200 
2 2016-12-02 23:40:49  300 
3 2016-12-02 23:58:53  400 
4 2016-12-02 23:59:02  300 
...

：

Timestamp BINS   200 300 400 500 
2016-12-02 23:30   2  0  0  0 
2016-12-02 23:40   0  1  0  0 
2016-12-02 23:50   0  1  1  0 
...

私はビンを作成しようとしています10分の時間間隔で、私は棒グラフを作ることができます。 CAT値として列を持つので、各CATがその時間ビン内に何回出現するかを数えることができます。

私は、これまでの時間ビンを作成することができているもの：

def create_hist(df, timestamp, freq, fontsize, outfile): 
    """ Create a histogram of the number of CATs per time period.""" 

    df.set_index(timestamp,drop=False,inplace=True) 
    to_plot = df[timestamp].groupby(pandas.TimeGrouper(freq=freq)).count() 
    ...

しかし、私の問題は、私は私の人生のためのCATの両方により、時間ビンによってどのグループに把握することはできませんです。私の最新の試みは、GROUPBYを行う前にdf.pivot(columns="CAT")を使用していたが、それは私だけでエラー与える：

def create_hist(df, timestamp, freq, fontsize, outfile): 
    """ Create a histogram of the number of CATs per time period.""" 

    df.pivot(columns="CAT") 
    df.set_index(timestamp,drop=False,inplace=True) 
    to_plot = df[timestamp].groupby(pandas.TimeGrouper(freq=freq)).count() 
    ...

私に与える：ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

出典

2017-02-09 andraiamatrix

get_dummiesとd resample：

In [11]: df1 = df.set_index("Timestamp") 

In [12]: pd.get_dummies(df1["CAT"]) 
Out[12]: 
        200 300 400 
Timestamp 
2016-12-02 23:35:28 1 0 0 
2016-12-02 23:37:43 1 0 0 
2016-12-02 23:40:49 0 1 0 
2016-12-02 23:58:53 0 0 1 
2016-12-02 23:59:02 0 1 0 

In [13]: pd.get_dummies(df1["CAT"]).resample("10min").sum() 
Out[13]: 
        200 300 400 
Timestamp 
2016-12-02 23:30:00 2 0 0 
2016-12-02 23:40:00 0 1 0 
2016-12-02 23:50:00 0 1 1

出典

2017-02-09 23:15:22

これは私よりもはるかにクリーンです。ありがとうございました！ – andraiamatrix

IIUC：

In [246]: df.pivot_table(index='Timestamp', columns='CAT', aggfunc='size', fill_value=0) \ 
      .resample('10T').sum() 
Out[246]: 
CAT     200 300 400 
Timestamp 
2016-12-02 23:30:00 2 0 0 
2016-12-02 23:40:00 0 1 0 
2016-12-02 23:50:00 0 1 1

出典

2017-02-09 22:52:02 MaxU

がpd.TimeGrouper

df.set_index('Timestamp') \ 
    .groupby([pd.TimeGrouper('10min'), 'CAT']) \ 
    .size().unstack(fill_value=0) 

CAT     200 300 400 
Timestamp       
2016-12-02 23:30:00 2 0 0 
2016-12-02 23:40:00 0 1 0 
2016-12-02 23:50:00 0 1 1

を使用して

出典

2017-02-09 22:52:33 piRSquared

パンダタイムグルーパーとピボット？

答えて

関連する問題