TF-IDFのPython

中のマトリックス

私のコードは次のようになります：TF-IDFのPython

from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.feature_extraction.text import TfidfTransformer 

train_set = "i have a ball", "he is good", "she played well" 
vectorizer = TfidfVectorizer(min_df=1) 

train_array = vectorizer.fit_transform(train_set).toarray() 
print(vectorizer.get_feature_names()) 
print(train_array)

私が受け取る出力は次のようになります。

['ball', 'good', 'have', 'he', 'is', 'played', 'she', 'well'] 

[[0.70710678, 0., 0.70710678, 0., 0., 0., 0., 0.], 
[0., 0.57735027, 0., 0.57735027, 0.57735027, 0., 0., 0.], 
[0., 0., 0., 0., 0., 0.57735027, 0.57735027, 0.57735027]]

質問は、私はのTF-IDFを計算することができる方法であります文："she is good"？コーパスは上記のコードのtrain_setです。

出典

2017-08-13 Jayanth

あなたは、単に.transform方法で新しいデータであなたのTF-IDFベクトライザーを適用します。

In [16]: test = ["she is good"] 

In [17]: test_array = vectorizer.transform(test) 

In [18]: test_array.A 
Out[18]: array([[0., 0.57735027, 0., 0., 0.57735027, 0., 0.57735027, 0.]]) 

In [19]: vectorizer.get_feature_names() 
Out[19]: ['ball', 'good', 'have', 'he', 'is', 'played', 'she', 'well']

出典

2017-08-13 20:39:36 MaxU

答えて

関連する問題