パンダにダミーを作成するために日付インデックスを使用できますか？

dateのインデックスをpandasにしてダミーを作成できたのですが、まだ何も見つかりませんでした。パンダにダミーを作成するために日付インデックスを使用できますか？

私は、私は、

df.reset_index(level=0, inplace=True)

を使用して列としてdateを設定し、ダミーを作成するには、このようなものを使用し、

可能性が知っている date

     dew temp 
date 
2010-01-02 00:00:00  129.0 -16  
2010-01-02 01:00:00  148.0 -15  
2010-01-02 02:00:00  159.0 -11  
2010-01-02 03:00:00  181.0 -7  
2010-01-02 04:00:00  138.0 -7 
...

でインデックス化されdfを持っています

df['main_hours'] = np.where((df['date'] >= '2010-01-02 03:00:00') & (df['date'] <= '2010-01-02 05:00:00')1,0)

しかし、私はd dateを列として使用せずにインデックス付きのdateをオンザフライで使用しています。そのような方法がありますか？pandas？何か提案がありがとうございます。

出典

2017-08-21 i.n.n.m

あなたの予想される出力は何ですか？ちょうどその時に、または日付にもダミーが欲しいですか？ – Alexander

@Alexander @ MaxUの答えがどのように出力されるかのように、 'main_hours'カラムで' 1'と '0'のダミーが必要でした。 –

IIUC：

df['main_hours'] = \ 
    np.where((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'), 
      1, 
      0)

または：

In [8]: df['main_hours'] = \ 
      ((df.index >= '2010-01-02 03:00:00') & 
      (df.index <= '2010-01-02 05:00:00')).astype(int) 

In [9]: df 
Out[9]: 
         dew temp main_hours 
date 
2010-01-02 00:00:00 129.0 -16   0 
2010-01-02 01:00:00 148.0 -15   0 
2010-01-02 02:00:00 159.0 -11   0 
2010-01-02 03:00:00 181.0 -7   1 
2010-01-02 04:00:00 138.0 -7   1

タイミング： 50.000行のDF：

In [19]: df = pd.concat([df.reset_index()] * 10**4, ignore_index=True).set_index('date') 

In [20]: pd.options.display.max_rows = 10 

In [21]: df 
Out[21]: 
         dew temp 
date 
2010-01-02 00:00:00 129.0 -16 
2010-01-02 01:00:00 148.0 -15 
2010-01-02 02:00:00 159.0 -11 
2010-01-02 03:00:00 181.0 -7 
2010-01-02 04:00:00 138.0 -7 
...     ... ... 
2010-01-02 00:00:00 129.0 -16 
2010-01-02 01:00:00 148.0 -15 
2010-01-02 02:00:00 159.0 -11 
2010-01-02 03:00:00 181.0 -7 
2010-01-02 04:00:00 138.0 -7 

[50000 rows x 2 columns] 

In [22]: %timeit ((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00')).astype(int) 
1.58 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

In [23]: %timeit np.where((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'), 1, 0) 
1.52 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

In [24]: df.shape 
Out[24]: (50000, 2)

出典

2017-08-21 16:15:05 MaxU

これはすばらしかった、すばらしい、働いた魅力！、私はこれを試み、私は 'astype（int）'を持っていなかった。提案ありがとう！ –

@ i.n.n.m、喜んで助けてください:) – MaxU

素早い質問ですが、なぜ 'np.where'を使用しませんでしたか？特に理由はないが、 –

またはbetweenを使用して、

pd.Series(df.index).between('2010-01-02 03:00:00', '2010-01-02 05:00:00', inclusive=True).astype(int) 

Out[1567]: 
0 0 
1 0 
2 0 
3 1 
4 1 
Name: date, dtype: int32

出典

2017-08-21 16:26:11 Wen

素敵な作品も素早くチェックできます！ –

df = df.assign(main_hours=0) 
df.loc[df.between_time(start_time='3:00', end_time='5:00').index, 'main_hours'] = 1 
>>> df 
        dew temp main_hours 
2010-01-02 00:00:00 129 -16   0 
2010-01-02 01:00:00 148 -15   0 
2010-01-02 02:00:00 159 -11   0 
2010-01-02 03:00:00 181 -7   1 
2010-01-02 04:00:00 138 -7   1

出典

2017-08-21 16:36:10 Alexander

ありがとうございます、条件を割り当てる新しい方法があります！素晴らしい提案！ –

パンダにダミーを作成するために日付インデックスを使用できますか？

答えて

関連する問題