multiindex DataFramesでgroupbyを適用して適用する

私はmultiIndex DataFrameを使用していて、groupby/apply（）操作をいくつか行いたいと思います。私はgroupbyをどのように組み合わせて適用するかと苦労しています。multiindex DataFramesでgroupbyを適用して適用する

私は、DataFrameの2つのインデックスの値を抽出して、それらの値を適用関数で比較したいと考えています。

apply関数がtrueの場合は、私のDataFrameの値に対してgroupby/sumを実行したいと思います。

forループを使用せずにこれを行う良い方法はありますか？ここで

# Index specifier 
ix = pd.MultiIndex.from_product(
    [['2015', '2016', '2017', '2018'], 
    ['2016', '2017', '2018', '2019', '2020'], 
    ['A', 'B', 'C']], 
    names=['SimulationStart', 'ProjectionPeriod', 'Group'] 
) 

df = pd.DataFrame(np.random.randn(60,1), index= ix, columns=['Input']) 

# Calculate sum over all projection periods for each simulation/group 
all_periods = df.groupby(level=['SimulationStart', 'Group']).sum() 

# This part of the code is not working yet 
# is there a way to extract data from the indices of the DataFrame? 
# Calculate sum over all projection periods for each simulation/group; 
# where projection period is a maximum of one year in the future 
one_year_ahead = df.groupby(level=['SimulationStart', 'Group']) \ 
        .apply(lambda x: x['ProjectionPeriod'] - \ 
            x['SimulationStart'] <= 1).sum()

出典

2016-11-03 Andreas

あなたは前がgroupby/sum操作を実行し、ProjectionPeriod - SimulationStart、差異を計算することができます。

get_values = df.index.get_level_values mask = (get_values('ProjectionPeriod') - get_values('SimulationStart')) <= 1 one_year_ahead = df.loc[mask].groupby(level=['SimulationStart', 'Group']).sum()

import numpy as np import pandas as pd ix = pd.MultiIndex.from_product( [[2015, 2016, 2017, 2018], [2016, 2017, 2018, 2019, 2020], ['A', 'B', 'C']], names=['SimulationStart', 'ProjectionPeriod', 'Group']) df = pd.DataFrame(np.random.randn(60,1), index= ix, columns=['Input']) get_values = df.index.get_level_values mask = (get_values('ProjectionPeriod') - get_values('SimulationStart')) <= 1 one_year_ahead = df.loc[mask].groupby(level=['SimulationStart', 'Group']).sum() print(one_year_ahead)

利回り

Input SimulationStart Group 2015 A 0.821851 B -0.643342 C -0.140112 2016 A 0.384885 B -0.252186 C -1.057493 2017 A -1.055933 B 1.096221 C -4.150002 2018 A 0.584859 B -4.062078 C 1.225105

出典

2016-11-03 20:10:30 unutbu

あなたの応答をありがとうございました。これは非常に役に立ちます。インデックスのピリオドが実際に文字列形式であるため、私のコードはまだうまくいきません。 – Andreas

それを行うための一つの方法です。

df.reset_index().query('ProjectionPeriod - SimulationStart == 1') \ 
    .groupby(['SimulationStart', 'Group']).Input.sum() 

SimulationStart Group 
2015    A  1.100246 
       B  -0.605710 
       C  1.366465 
2016    A  0.359406 
       B  -2.077444 
       C  -0.004356 
2017    A  0.604497 
       B  -0.362941 
       C  0.103945 
2018    A  -0.861976 
       B  -0.737274 
       C  0.237512 
Name: Input, dtype: float64

あなたはGroupの列の一意の値を持っているので、これも動作しますが、私はあなたが望むそのことを信じていません。

df.reset_index().query('ProjectionPeriod - SimulationStart == 1') \ 
    [['SimulationStart', 'Group', 'Input']]

出典

2016-11-03 20:03:12 piRSquared

multiindex DataFramesでgroupbyを適用して適用する

答えて

関連する問題