2017-08-02 9 views
1

パンダgroupbyデータフレームに「合計」列を追加するにはどうすればよいですか? 以下のgroupbyデータフレームの 'Bearish'と 'Bullish'の両方の列に「Sum」をしたいと思います。groupbyデータフレームに列を追加する

その後、私は他の二つの列を追加したいと思います:

%弱気=弱気/合計* 100

%強気=強気/合計* 100

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count() 
group_df = group_df.unstack() 

        message   
sentiment   Bearish Bullish 
created        
2017-08-01 23:00:00  2.0  2.0 
2017-08-02 00:00:00  1.0  3.0 
2017-08-02 01:00:00  NaN  4.0 
+1

があなたの元データフレームのためのヘッドを提供することができますか? –

答えて

1

あなたがしてconcatを使用することができます新しいDataFrame

idx = pd.date_range('2017-08-01 23:13:00', periods=12, freq='12T') 
df = pd.DataFrame({'message':[1,1,2,2,2,2,2,2,3,3,3,3], 
        'sentiment':['Bearish'] * 5 + ['Bullish'] * 7 }, index=idx) 
print (df) 
        message sentiment 
2017-08-01 23:13:00  1 Bearish 
2017-08-01 23:25:00  1 Bearish 
2017-08-01 23:37:00  2 Bearish 
2017-08-01 23:49:00  2 Bearish 
2017-08-02 00:01:00  2 Bearish 
2017-08-02 00:13:00  2 Bullish 
2017-08-02 00:25:00  2 Bullish 
2017-08-02 00:37:00  2 Bullish 
2017-08-02 00:49:00  3 Bullish 
2017-08-02 01:01:00  3 Bullish 
2017-08-02 01:13:00  3 Bullish 
2017-08-02 01:25:00  3 Bullish 

group_df =df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count() 
#add ['message'] for remove Multiindex in columns 
group_df = group_df['message'].unstack() 

#divide by sum 
#add prefix - https://stackoverflow.com/q/45453508/2901002 
df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%') 
print (df1) 
        %Bearish %Bullish 
2017-08-01 23:00:00  80.0  NaN 
2017-08-02 00:00:00  20.0 57.142857 
2017-08-02 01:00:00  NaN 42.857143 

df = pd.concat([group_df, df1], axis=1) 
print (df) 
        Bearish Bullish %Bearish %Bullish 
2017-08-01 23:00:00  4.0  NaN  80.0  NaN 
2017-08-02 00:00:00  1.0  4.0  20.0 57.142857 
2017-08-02 01:00:00  NaN  3.0  NaN 42.857143 

もし必要GroupBy.size

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).size() 
group_df = group_df.unstack() 

df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%') 
print (df1) 
        %Bearish %Bullish 
2017-08-01 23:00:00  80.0  NaN 
2017-08-02 00:00:00  20.0 57.142857 
2017-08-02 01:00:00  NaN 42.857143 

df = pd.concat([group_df, df1], axis=1) 
print (df) 
        Bearish Bullish %Bearish %Bullish 
2017-08-01 23:00:00  4.0  NaN  80.0  NaN 
2017-08-02 00:00:00  1.0  4.0  20.0 57.142857 
2017-08-02 01:00:00  NaN  3.0  NaN 42.857143 

What is the difference between size and count in pandas?

関連する問題