2016-05-03 18 views
1

と列の値の合計数:パンダ:私は、次のデータフレームを有するGROUPBY

url='https://raw.githubusercontent.com/108michael/ms_thesis/master/mpl.Bspons.merge.1' 
df=pd.read_csv(url, index_col=0) 
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d') 
df = df.set_index(['date']) 

df.head(3) 

    state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session  name       disposition catcode  naics 
date                
2006-05-01 AK 2006 6.6  -0.044452 1440 Republican sen  s2686-109 S2AK00010 39 109  National Cable & Telecommunications Association  support  C4500 81 
2006-05-01 AK 2006 6.6  -0.044452 1440 Republican sen  s2686-109 S2AK00010 39 109  National Cable & Telecommunications Association  support  C4500 517 
2007-03-27 AK 2007 6.3  -0.046520 1440 Republican sen  s1000-110 S2AK00010 40 110  National Treasury Employees Union support  L1100 NaN 

Iは、catcode > disposition > id.fecによって定義された各グループ内の紙幣の数を合計します。

df.head(3) 

    state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session  name     disposition  catcode  naics billsum 
date                 
2006-05-01 AK 2006 6.6  -0.044452 1440 Republican sen  s2686-109 S2AK00010 39 109  National Cable & Telecommunications Association  support  C4500 81 s2686-109s2686-109 
2006-05-01 AK 2006 6.6  -0.044452 1440 Republican sen  s2686-109 S2AK00010 39 109  National Cable & Telecommunications Association  support  C4500 517  s2686-109s2686-109 
2007-03-27 AK 2007 6.3  -0.046520 1440 Republican sen  s1000-110 S2AK00010 40 110  National Treasury Employees Union support  L1100 NaN  s1000-110 

代わりに、各グループに含まれる紙幣の「数」を返す戻り、コードは、各グループに含まれる紙幣の全てを返し

df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \ 
     'disposition', 'id.fec']).bills.transform('sum') 

:私は、次のコードを使用します。私は単に各グループの請求書の数がほしいと思う。誰かがこの作品を作る方法について考えているのですか?

答えて

1

私はないsum、あなたがtransformsizeが必要だと思う:

df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \ 
     'disposition', 'id.fec']).bills.transform('size') 

print df.head(3) 
      state year unemployment log_diff_unemployment id.thomas \ 
date                  
2006-05-01 AK 2006.0   6.6    -0.044452  1440 
2006-05-01 AK 2006.0   6.6    -0.044452  1440 
2007-03-27 AK 2007.0   6.3    -0.046520  1440 

       party type  bills  id.fec years_exp session \ 
date                  
2006-05-01 Republican sen s2686-109 S2AK00010   39  109 
2006-05-01 Republican sen s2686-109 S2AK00010   39  109 
2007-03-27 Republican sen s1000-110 S2AK00010   40  110 

                 name disposition \ 
date                  
2006-05-01 National Cable & Telecommunications Association  support 
2006-05-01 National Cable & Telecommunications Association  support 
2007-03-27    National Treasury Employees Union  support 

      catcode naics billsum 
date        
2006-05-01 C4500 81  2 
2006-05-01 C4500 517  2 
2007-03-27 L1100 NaN  1 
+0

おかげで再び! :) –

関連する問題