スパース行列にpandas列を追加する

私のモデルで使用したいX変数には、さらに派生した値があります。スパース行列にpandas列を追加する

XAll = pd_data[['title','wordcount','sumscores','length']] 
y = pd_data['sentiment'] 
X_train, X_test, y_train, y_test = train_test_split(XAll, y, random_state=1)

私はタイトルのテキストデータで働いているように、私が最初に別途DTMに変換します

vect = CountVectorizer(max_df=0.5) 
vect.fit(X_train['title']) 
X_train_dtm = vect.transform(X_train['title']) 
column_index = X_train_dtm.indices 

print(type(X_train_dtm)) # This is <class 'scipy.sparse.csr.csr_matrix'> 
print("X_train_dtm shape",X_train_dtm.get_shape()) # This is (856, 2016) 
print("column index:",column_index)  # This is column index: [ 533 754 859 ..., 633 950 1339]

今私は、ドキュメントの用語行列などのテキストを持っていることを、私は追加したいです'wordcount'、 'sumscores'、 'length'のようなX_train_dtmの他の機能は数値です。これは、新しいdtmを使用してモデルを作成するので、追加機能を挿入した場合より正確です。

パンダデータフレームの数値列をスパースcsr行列に追加するにはどうすればよいですか？

出典

2017-01-30 Bonson

解決策が見つかりました。

from scipy.sparse import hstack 
X_train_dtm = hstack((X_train_dtm,np.array(X_train['wordcount'])[:,None]))

：私たちは、この使用してsparse.hstackを行うことができます

出典

2017-01-31 01:03:30 Bonson

スパース行列にpandas列を追加する

答えて

関連する問題