Datetimeを使ってPandas DataFrameからnsmallestの平均を得る方法

日付を選択してデータフレームからnsmallestとnsmallestの平均を取得しようとしています。このデータフレームからDatetimeを使ってPandas DataFrameからnsmallestの平均を得る方法

：

    2  chk2  chk3  val 
0             
2016-08-01 31.340000 2016-05-09 2016-08-08 18.605 
2016-08-02 32.359999 2016-05-09 2016-08-08 18.605 
2016-08-03 32.089001 2016-05-09 2016-08-08 18.605 
2016-08-04 31.194001 2016-05-09 2016-08-08 18.605 
2016-08-05 30.585000 2016-05-09 2016-08-08 18.605 
2016-08-08 20.490000 2016-05-09 2016-08-08 18.605 
2016-08-09 20.135000 2016-08-08 2016-11-21 18.605 
2016-08-10 19.103000 2016-08-08 2016-11-21 18.605 
2016-08-11 19.452000 2016-08-08 2016-11-21 18.605 
2016-08-12 19.241001 2016-08-08 2016-11-21 18.605 
2016-08-15 19.645000 2016-08-08 2016-11-21 18.605 
2016-08-16 20.124000 2016-08-08 2016-11-21 18.605 
2016-08-17 19.863001 2016-08-08 2016-11-21 18.605 
2016-08-18 19.667999 2016-08-08 2016-11-21 18.605 
2016-08-19 19.083001 2016-08-08 2016-11-21 18.605 
2016-08-22 18.163000 2016-08-08 2016-11-21 18.605 
2016-08-23 18.948001 2016-08-08 2016-11-21 18.605 
2016-08-24 19.329999 2016-08-08 2016-11-21 18.605 
2016-08-25 19.735999 2016-08-08 2016-11-21 18.605 
2016-08-26 19.769999 2016-08-08 2016-11-21 18.605 
2016-08-29 18.704000 2016-08-08 2016-11-21 18.605 
2016-08-30 19.756000 2016-08-08 2016-11-21 18.605 
2016-08-31 19.931000 2016-08-08 2016-11-21 18.605

これは私の全体のデータフレームのnsmallestを与え、chk2とchk3の日付は、この機能を適用する最初の一週間後

df.query('chk2 <= index <= chk3')[2].nsmallest(3) 

0 
2016-08-22 18.163000 
2016-08-29 18.704000 
2016-08-23 18.948001 
Name: 2, dtype: float64

を変更したの違いを無視しているようです - - それは最初の週に日付の変更をもたらすと思われる。

def _test(row): 
#  df.query('chk2 <= index <= chk3')[2].nsmallest(3).mean() 
    return df.query('chk2 <= index <= chk3')[2].nsmallest(3).mean() 

    #return df.query('row[1] <= index <= row[2]')[2].nsmallest(3).mean() 
    #UndefinedVariableError: ("name 'row' is not defined", u'occurred at index 2016-08-01 00:00:00') 


df.info() 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 23 entries, 2016-08-01 to 2016-08-31 
Data columns (total 3 columns): 
2  23 non-null float64 
chk2 23 non-null datetime64[ns] 
chk3 23 non-null datetime64[ns] 
dtypes: datetime64[ns](2), float64(1) 
memory usage: 736.0 bytes

出典

2017-03-02 Merlin

私が正しく理解していれば、私はあなたがそれらのグループにあなたの操作を実行するには日付が変わると、その後transformを取得するためにgroupbyを使用することができると思います。

(df.query('chk2 <= index <= chk3').groupby(['chk2', 'chk3']) 
            .transform(lambda x: x.nsmallest(3).mean())

デモ

>>> df 
        2  chk2  chk3 
2016-08-01 31.340000 2016-05-09 2016-08-08 
2016-08-02 32.359999 2016-05-09 2016-08-08 
... 
2016-08-30 19.756000 2016-08-08 2016-11-21 
2016-08-31 19.931000 2016-08-08 2016-11-21 

>>> (df.query('chk2 <= index <= chk3').groupby(['chk2', 'chk3']) 
             .transform(lambda x: x.nsmallest(3).mean()) 
       2 
2016-08-01 27.423 
2016-08-02 27.423 
2016-08-03 27.423 
2016-08-04 27.423 
2016-08-05 27.423 
2016-08-08 27.423 
2016-08-09 18.605 
2016-08-10 18.605 
2016-08-11 18.605 
2016-08-12 18.605 
2016-08-15 18.605 
2016-08-16 18.605 
2016-08-17 18.605 
2016-08-18 18.605 
2016-08-19 18.605 
2016-08-22 18.605 
2016-08-23 18.605 
2016-08-24 18.605 
2016-08-25 18.605 
2016-08-26 18.605 
2016-08-29 18.605 
2016-08-30 18.605 
2016-08-31 18.605

出典

2017-03-02 17:58:37 miradulo

Datetimeを使ってPandas DataFrameからnsmallestの平均を得る方法

答えて

関連する問題