pythonでsklearnのtf-idfのスコア行列を出力するには

私はsklearnを使ってtf-idfの値を次のように求めています。pythonでsklearnのtf-idfのスコア行列を出力するには

from sklearn.feature_extraction.text import TfidfVectorizer 
myvocabulary = ['life', 'learning'] 
corpus = {1: "The game of life is a game of everlasting learning", 2: "The unexamined life is not worth living", 3: "Never stop learning"} 
tfidf = TfidfVectorizer(vocabulary = myvocabulary, ngram_range = (1,3)) 
tfs = tfidf.fit_transform(corpus.values())

ここで、計算されたtf-idfスコアを次のように表示します。

次のようにしてみました。

idf = tfidf.idf_ 
dic = dict(zip(tfidf.get_feature_names(), idf)) 
print(dic)

ただし、次のように出力されます。

{'life': 1.2876820724517808, 'learning': 1.2876820724517808}

私を助けてください。

出典

2017-10-06 Anonymous

あなたは 'tfidf.fit_transform（から取得する実際の出力）'だけこの形態であるを取得するために簡単な変更を行うことができます。必要なのは 'tfidf.get_feature_names（）'から得られるカラム名だけです。これらの2つをデータフレームにまとめてください。 –

おかげで、私はthis question

feature_names = tfidf.get_feature_names() 
corpus_index = [n for n in corpus] 
import pandas as pd 
df = pd.DataFrame(tfs.T.todense(), index=feature_names, columns=corpus_index) 
print(df)

出典

2017-10-06 08:57:20

から答えを見つけることができσηγために、質問者が提供する回答は、私は1つの調整をしたいと思い、権利です。上記のコードは、マトリックスこの

  feature1  feature2

いるDoc1

Doc2の

ように見えなければならない

  Doc1  Doc2

特長1

特長2

を与えます

ので、あなたはそれ

df = pd.DataFrame(tfs.todense(), index=corpus_index, columns=feature_names)

出典

2017-10-18 08:04:48

pythonでsklearnのtf-idfのスコア行列を出力するには

答えて

関連する問題