2017-02-07 10 views
3

私は時系列に変更したDataframeを持っています。期間は2013年から2017年までです。すべてのデータを1日単位でグループ化したいと思います。 たとえば、すべての月曜日は一緒になり、時間別に表示され、その後すべての火曜日に表示されます。最後には168(24 * 7)の行があります。 これを行う最善の方法は何ですか?グループ化全日制Python

私はリサンプリングした後、このサンプルを持っている:

2017-01-17 00:00:00 NaN 
2017-01-17 01:00:00 NaN 
2017-01-17 02:00:00 NaN 
2017-01-17 03:00:00 NaN 
2017-01-17 04:00:00 1.0 
2017-01-17 05:00:00 NaN 
2017-01-17 06:00:00 NaN 
2017-01-17 07:00:00 NaN 
2017-01-17 08:00:00 NaN 
2017-01-17 09:00:00 1.0 
2017-01-17 10:00:00 3.0 
2017-01-17 11:00:00 3.0 
2017-01-17 12:00:00 3.0 
2017-01-17 13:00:00 5.0 
2017-01-17 14:00:00 2.0 
2017-01-17 15:00:00 1.0 
2017-01-17 16:00:00 2.0 
2017-01-17 17:00:00 1.0 
2017-01-17 18:00:00 1.0 
2017-01-17 19:00:00 1.0 
2017-01-17 20:00:00 NaN 
2017-01-17 21:00:00 NaN 
2017-01-17 22:00:00 NaN 
2017-01-17 23:00:00 NaN   
2017-01-24 10:00:00 14.0 
2017-01-24 11:00:00 14.0 
2017-01-24 12:00:00 5.0 
2017-01-24 13:00:00 21.0 
2017-01-24 14:00:00 14.0 
2017-01-24 15:00:00 7.0 
2017-01-24 16:00:00 9.0 
2017-01-24 17:00:00 2.0 
2017-01-24 18:00:00 1.0 
2017-01-24 19:00:00 NaN 
2017-01-24 20:00:00 NaN 
2017-01-24 21:00:00 2.0 

私のような何かがしたい:例えば、いくつかの機能を集約することで、私はあなたがdayofweekhourによってgroupbyをすることができると思い

    (count sum) 
Monday: 00:00  xx 
      01:00  xx 
      ... 
      23:00  xx 
Tuesday: 00:00  xx 
      01:00  xx 
      ... 
      23:00  xx   
+0

あなたの説明は、実際に具体的な提案を提供するために少し高すぎるレベルです。コードを表示できますか? –

+0

http://stackoverflow.com/questions/16266019/python-pandas-group-datetime-column-into-hour-and-minute-aggregations –

答えて

5

sum

np.random.seed(100) 
start = pd.to_datetime('2013-02-24 04:00:00') 
rng = pd.date_range(start, periods=100, freq='3H') 

#DataFrame has DatetimeIndex 
df = pd.DataFrame({'a': np.random.randint(10, size=100)}, index=rng) 
print (df) 
        a 
2013-02-24 04:00:00 8 
2013-02-24 07:00:00 8 
2013-02-24 10:00:00 3 
2013-02-24 13:00:00 7 
2013-02-24 16:00:00 7 
2013-02-24 19:00:00 0 
2013-02-24 22:00:00 4 
2013-02-25 01:00:00 2 
2013-02-25 04:00:00 5 
... 
... 
print (df.index.weekday_name) 
['Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Monday' 
'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Tuesday' 
'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 
'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 
'Wednesday' 'Wednesday' 'Thursday' 'Thursday' 'Thursday' 'Thursday' 
'Thursday' 'Thursday' 'Thursday' 'Thursday' 'Friday' 'Friday' 'Friday' 
'Friday' 'Friday' 'Friday' 'Friday' 'Friday' 'Saturday' 'Saturday' 
'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Sunday' 
'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Monday' 
'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Tuesday' 
'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 
'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 
'Wednesday' 'Wednesday' 'Thursday' 'Thursday' 'Thursday' 'Thursday' 
'Thursday' 'Thursday' 'Thursday' 'Thursday' 'Friday' 'Friday' 'Friday' 
'Friday' 'Friday'] 

print (df.index.hour) 
[ 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 
    7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 
10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 
13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13] 
print (df.groupby([df.index.weekday_name, df.index.hour])['a'].sum()) 
Friday  1  13 
      4  10 
      7  6 
      10 13 
      13 11 
      16  2 
      19  0 
      22  8 
Monday  1  6 
      4  12 
      7  8 
      10  5 
      13 11 
... 
... 

DataFramedate列がある場合:DatetimeIndexSeries場合

np.random.seed(100) 
start = pd.to_datetime('2013-02-24 04:00:00') 
rng = pd.date_range(start, periods=100, freq='3H') 

df = pd.DataFrame({'date': rng, 'a': np.random.randint(10, size=100)}) 
print (df) 
    a    date 
0 8 2013-02-24 04:00:00 
1 8 2013-02-24 07:00:00 
2 3 2013-02-24 10:00:00 
3 7 2013-02-24 13:00:00 
4 7 2013-02-24 16:00:00 
5 0 2013-02-24 19:00:00 
6 4 2013-02-24 22:00:00 
7 2 2013-02-25 01:00:00 
8 5 2013-02-25 04:00:00 

print (df.groupby([df.date.dt.weekday_name, df.date.dt.hour])['a'].sum()) 
date  date 
Friday  1  13 
      4  10 
      7  6 
      10  13 
      13  11 
      16  2 
      19  0 
      22  8 
Monday  1  6 
      4  12 
      7  8 
      10  5 
      13  11 

を:

s = pd.Series(np.random.randint(10, size=100), index=rng) 
print (s) 
2013-02-24 04:00:00 8 
2013-02-24 07:00:00 8 
2013-02-24 10:00:00 3 
2013-02-24 13:00:00 7 
2013-02-24 16:00:00 7 
2013-02-24 19:00:00 0 
2013-02-24 22:00:00 4 
2013-02-25 01:00:00 2 
2013-02-25 04:00:00 5 
2013-02-25 07:00:00 2 
2013-02-25 10:00:00 2 
2013-02-25 13:00:00 2 

print (s.groupby([s.index.weekday_name, s.index.hour]).sum()) 
Friday  1  13 
      4  10 
      7  6 
      10 13 
      13 11 
      16  2 
      19  0 
      22  8 
Monday  1  6 
      4  12 
      7  8 
      10  5 
      13 11 

最終DataFrameためreset_index()を追加します。

df = s.groupby([s.index.weekday_name, s.index.hour]).sum().reset_index() 
df.columns = ['days','hours','val'] 
print (df) 
     days hours val 
0  Friday  1 13 
1  Friday  4 10 
2  Friday  7 6 
3  Friday  10 13 
4  Friday  13 11 
5  Friday  16 2 
6  Friday  19 0 
7  Friday  22 8 
8  Monday  1 6 
9  Monday  4 12 
10  Monday  7 8 
11  Monday  10 5 
12  Monday  13 11 

EDITコメントによって:

print (s) 
2017-01-24 10:00:00 14.0 
2017-01-24 11:00:00 14.0 
2017-01-24 12:00:00  5.0 
2017-01-24 13:00:00 21.0 
2017-01-24 14:00:00 14.0 
2017-01-24 15:00:00  7.0 
2017-01-24 16:00:00  9.0 
2017-01-24 17:00:00  2.0 
2017-01-24 18:00:00  1.0 
2017-01-24 19:00:00  NaN 
2017-01-24 20:00:00  NaN 
2017-01-24 21:00:00  2.0 
Name: a, dtype: float64 

df = s.groupby([s.index.weekday_name, s.index.hour]).sum().reset_index() 
df.columns = ['days','hours','val'] 
print (df) 
     days hours val 
0 Tuesday  10 14.0 
1 Tuesday  11 14.0 
2 Tuesday  12 5.0 
3 Tuesday  13 21.0 
4 Tuesday  14 14.0 
5 Tuesday  15 7.0 
6 Tuesday  16 9.0 
7 Tuesday  17 2.0 
8 Tuesday  18 1.0 
9 Tuesday  19 NaN 
10 Tuesday  20 NaN 
11 Tuesday  21 2.0 
+0

あなたのソリューションは非常に近いですが、どうすればナン? – datascana

+0

これらのNaNでサンプルデータを変更できますか? – jezrael

+0

@datascana - 私はそれを試して、それは動作します、私の答えを編集odを参照してください。それとも別のものが必要ですか? – jezrael

関連する問題