2017-07-31 4 views
2

をバイナリにカテゴリ列を変換 -パンダ - 私はのように見えるデータセットを有する符号化された形式

 yyyy  month  tmax   tmin 
0 1908 January   5.0   -1.4 
1 1908 February   7.3   1.9 
2 1908  March   6.2   0.3 
3 1908  April   7.4   2.1 
4 1908  May  16.5   7.7 
5 1908  June  17.7   8.7 
6 1908  July  20.1   11.0 
7 1908  August  17.5   9.7 
8 1908 September  16.3   8.4 
9 1908 October  14.6   8.0 
10 1908 November   9.6   3.4 
11 1908 December   5.8   -0.3 
12 1909 January   5.0   0.1 
13 1909 February   5.5   -0.3 
14 1909  March   5.6   -0.3 
15 1909  April  12.2   3.3 
16 1909  May  14.7   4.8 
17 1909  June  15.0   7.5 
18 1909  July  17.3   10.8 
19 1909  August  18.8   10.7 
20 1909 September  14.5   8.1 
21 1909 October  12.9   6.9 
22 1909 November   7.5   1.7 
23 1909 December   5.3   0.4 
24 1910 January   5.2   -0.5 
... 

これは、4つの変数を有する - yyyymonthtmax(最高温度)とtmin

をI予測中に月の列を変数として使用したいので、それをバイナリでエンコードされたバージョンに変換する必要があります。基本的には、Januaryというデータセットに12個の変数を追加し、特定の行の月が「1月」の場合はJanuary1、残りの11列には0とする必要があります。

私はピボットテーブルを調べましたが、それは私の原因を助けません。どのように簡単なエレガントな方法でこれを行う上の任意のアイデア?

答えて

5

私はあなたがget_dummiesが必要だと思う:

df = pd.get_dummies(df['month']) 

とIF元に新しい列を追加し、popmonth使用joinを削除する必要があります。

df2 = df.join(pd.get_dummies(df.pop('month'))) 
print (df2.head()) 
    yyyy tmax tmin April August December February January July June \ 
0 1908 5.0 -1.4  0  0   0   0  1  0  0 
1 1908 7.3 1.9  0  0   0   1  0  0  0 
2 1908 6.2 0.3  0  0   0   0  0  0  0 
3 1908 7.4 2.1  1  0   0   0  0  0  0 
4 1908 16.5 7.7  0  0   0   0  0  0  0 

    March May November October September 
0  0 0   0  0   0 
1  0 0   0  0   0 
2  1 0   0  0   0 
3  0 0   0  0   0 
4  0 1   0  0   0 

month取り外す必要があるされていない場合:

df2 = df.join(pd.get_dummies(df['month'])) 
print (df2.head()) 
    yyyy  month tmax tmin April August December February January \ 
0 1908 January 5.0 -1.4  0  0   0   0  1 
1 1908 February 7.3 1.9  0  0   0   1  0 
2 1908  March 6.2 0.3  0  0   0   0  0 
3 1908  April 7.4 2.1  1  0   0   0  0 
4 1908  May 16.5 7.7  0  0   0   0  0 

    July June March May November October September 
0  0  0  0 0   0  0   0 
1  0  0  0 0   0  0   0 
2  0  0  1 0   0  0   0 
3  0  0  0 0   0  0   0 
4  0  0  0 1   0  0   0 

より多くの可能な解決策があるの列を並べ替える必要がある場合 - reindexreindex_axisを使用します。

months = ['January', 'February', 'March','April' ,'May', 'June', 'July', 'August', 'September','October', 'November','December'] 
df1 = pd.get_dummies(df['month']).reindex_axis(months, 1) 
print (df1.head()) 
    January February March April May June July August September \ 
0  1   0  0  0 0  0  0  0   0 
1  0   1  0  0 0  0  0  0   0 
2  0   0  1  0 0  0  0  0   0 
3  0   0  0  1 0  0  0  0   0 
4  0   0  0  0 1  0  0  0   0 

    October November December 
0  0   0   0 
1  0   0   0 
2  0   0   0 
3  0   0   0 
4  0   0   0 

df1 = pd.get_dummies(df['month']).reindex(columns=months) 
print (df1.head()) 
    January February March April May June July August September \ 
0  1   0  0  0 0  0  0  0   0 
1  0   1  0  0 0  0  0  0   0 
2  0   0  1  0 0  0  0  0   0 
3  0   0  0  1 0  0  0  0   0 
4  0   0  0  0 1  0  0  0   0 

    October November December 
0  0   0   0 
1  0   0   0 
2  0   0   0 
3  0   0   0 
4  0   0   0 

それともordered categoricalに列monthを変換:

df1 = pd.get_dummies(df['month'].astype('category', categories=months, ordered=True)) 
print (df1.head()) 
    January February March April May June July August September \ 
0  1   0  0  0 0  0  0  0   0 
1  0   1  0  0 0  0  0  0   0 
2  0   0  1  0 0  0  0  0   0 
3  0   0  0  1 0  0  0  0   0 
4  0   0  0  0 1  0  0  0   0 

    October November December 
0  0   0   0 
1  0   0   0 
2  0   0   0 
3  0   0   0 
4  0   0   0 
+1

感謝。 –

3

IIUC、

することができますassign,**開梱作業、およびpd.get_dummies

df.assign(**pd.get_dummies(df['month'])) 

出力:このため

yyyy  month tmax tmin April August December February January \ 
0 1908 January 5.0 -1.4  0  0   0   0  1 
1 1908 February 7.3 1.9  0  0   0   1  0 
2 1908  March 6.2 0.3  0  0   0   0  0 
3 1908  April 7.4 2.1  1  0   0   0  0 
4 1908  May 16.5 7.7  0  0   0   0  0 
5 1908  June 17.7 8.7  0  0   0   0  0 
6 1908  July 20.1 11.0  0  0   0   0  0 
7 1908  August 17.5 9.7  0  1   0   0  0 
8 1908 September 16.3 8.4  0  0   0   0  0 
9 1908 October 14.6 8.0  0  0   0   0  0 
10 1908 November 9.6 3.4  0  0   0   0  0 
11 1908 December 5.8 -0.3  0  0   1   0  0 
12 1909 January 5.0 0.1  0  0   0   0  1 
13 1909 February 5.5 -0.3  0  0   0   1  0 
14 1909  March 5.6 -0.3  0  0   0   0  0 
15 1909  April 12.2 3.3  1  0   0   0  0 
16 1909  May 14.7 4.8  0  0   0   0  0 
17 1909  June 15.0 7.5  0  0   0   0  0 
18 1909  July 17.3 10.8  0  0   0   0  0 
19 1909  August 18.8 10.7  0  1   0   0  0 
20 1909 September 14.5 8.1  0  0   0   0  0 
21 1909 October 12.9 6.9  0  0   0   0  0 
22 1909 November 7.5 1.7  0  0   0   0  0 
23 1909 December 5.3 0.4  0  0   1   0  0 
24 1910 January 5.2 -0.5  0  0   0   0  1 

    July June March May November October September 
0  0  0  0 0   0  0   0 
1  0  0  0 0   0  0   0 
2  0  0  1 0   0  0   0 
3  0  0  0 0   0  0   0 
4  0  0  0 1   0  0   0 
5  0  1  0 0   0  0   0 
6  1  0  0 0   0  0   0 
7  0  0  0 0   0  0   0 
8  0  0  0 0   0  0   1 
9  0  0  0 0   0  1   0 
10  0  0  0 0   1  0   0 
11  0  0  0 0   0  0   0 
12  0  0  0 0   0  0   0 
13  0  0  0 0   0  0   0 
14  0  0  1 0   0  0   0 
15  0  0  0 0   0  0   0 
16  0  0  0 1   0  0   0 
17  0  1  0 0   0  0   0 
18  1  0  0 0   0  0   0 
19  0  0  0 0   0  0   0 
20  0  0  0 0   0  0   1 
21  0  0  0 0   0  1   0 
22  0  0  0 0   1  0   0 
23  0  0  0 0   0  0   0 
24  0  0  0 0   0  0   0 
関連する問題