2017-05-17 9 views
2

私はパンダを使って分析を行っています。私は値が2つの行の合計である新しい列を作成したいと思います。元のデータセットは、私が最初のデータに新しい列を作成したい...私は、次のデータフレームを利用した新しい列を作成したい従ってください...パンダで2つの条件で列を作成

Admit  Gender Dept Freq 
0 Admitted Male A 512 
1 Rejected Male A 313 
2 Admitted Female A 89 
3 Rejected Female A 19 
4 Admitted Male B 353 
5 Rejected Male B 207 
6 Admitted Female B 17 
7 Rejected Female B 8 
8 Admitted Male C 120 
9 Rejected Male C 205 
10 Admitted Female C 202 
11 Rejected Female C 391 
12 Admitted Male D 138 
13 Rejected Male D 279 
14 Admitted Female D 131 
15 Rejected Female D 244 
16 Admitted Male E 53 
17 Rejected Male E 138 
18 Admitted Female E 94 
19 Rejected Female E 299 
20 Admitted Male F 22 
21 Rejected Male F 351 
22 Admitted Female F 24 
23 Rejected Female F 317 

として

Dept Gender Freq 
0 A Female 108 
1 A Male 825 
2 B Female 25 
3 B Male 560 
4 C Female 593 
5 C Male 325 
6 D Female 375 
7 D Male 417 
8 E Female 393 
9 E Male 191 
10 F Female 341 
11 F Male 373 

です第2のデータフレームのFreq列を利用する。私は108if Detp and Genderを挿入する必要があります両方のデータフレームで同じです。新しいデータフレームが...

Admit  Gender Dept Freq Total 
0 Admitted Male A 512  825 
1 Rejected Male A 313  825 
2 Admitted Female A 89   108 
3 Rejected Female A 19   108 
4 Admitted Male B 353  560 
5 Rejected Male B 207  560 
6 Admitted Female B 17   25 
7 Rejected Female B 8   25 

は、私は、次のコードを試してみました...このように、私は次のエラーを取得する

for i in data.iterrows(): 
    for j in total_freq.iterrows(): 
     if i[1].Gender == total_freq.Gender & i[1].Dept == total_freq.Dept: 
      data['Total'] = total_freq.Freq 

... TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]

任意の助けになるはずです正しい値で列を作成しますか?

答えて

2

あなたはあなたが最初に2つ目のデータフレームから、あなたの合計を参加左にpandas.DataFrame.merge()を使用することができます

Admit Gender Dept Freq Total 
0 Admitted Male A 512 825 
1 Rejected Male A 313 825 
2 Admitted Female A 89 108 
3 Rejected Female A 19 108 
4 Admitted Male B 353 560 
5 Rejected Male B 207 560 
6 Admitted Female B 17 25 
7 Rejected Female B 8 25 
8 Admitted Male C 120 325 
9 Rejected Male C 205 325 
10 Admitted Female C 202 593 
11 Rejected Female C 391 593 
12 Admitted Male D 138 417 
13 Rejected Male D 279 417 
14 Admitted Female D 131 375 
15 Rejected Female D 244 375 
16 Admitted Male E 53 191 
17 Rejected Male E 138 191 
18 Admitted Female E 94 393 
19 Rejected Female E 299 393 
20 Admitted Male F 22 373 
21 Rejected Male F 351 373 
22 Admitted Female F 24 341 
23 Rejected Female F 317 341 
0

を取得

df['Total'] = df.groupby(['Dept', 'Gender']).Freq.transform('sum') 

を変換することができます。まず、dfの合計でfreqの名前を変更します。

df1 = df1.rename(columns={'Freq':'Total'}) 
df_totals = pd.merge(df, df1['Total'], how='left', on=['Gender', 'Dept'])