Pythonで文書用語頻度行列を作成する方法

文書用語行列を作成するには、テキストファイルresult.txtを入力します。Pythonで文書用語頻度行列を作成する方法

Counter({'STTP': 6, 'AVENUES': 4, 'ENGINEERING': 4, 'MINING': 4, 'THE': 4, 'SCOE': 4, 'HERE': 4, 'DATA': 4, 'TOOLS': 4, 'PROGRAMMING': 3, 'TEMPERATURE': 3})

をしかし、この方法で結果を得た：私はこのようにした単語の発生をカウントしようとしています。ここ

"degree,the,mituski,programming,national,it,high,sakal,engineering,paper,college,signed 
1,4,2,3,1,2,1,1,4,1,1,1"

は、私が使用するコードです：

tdm = textmining.TermDocumentMatrix() 

files = glob.glob("result.txt") 

for f in files: 

    content = open(f).read() 

    content = content.replace('\n', ' \n') 

    tdm.add_doc(content) 

    tdm.write_csv('matrix1.csv', cutoff=1)

出典

2017-08-11 aneeket

：

Instead of writing out the matrix you can also access its rows directly.
# Let's print them to the screen. for row in tdm.rows(cutoff=1): print row 

だからあなたの質問のようにdictを取得するため、あなたはで行くことができます。 – stamaimer

結果正しく形成されたcsvファイルです。最初の行はヘッダー（単語）で、2番目の行は単語の数です。

表示されているものは、classコンストラクタに渡されたdictのように見えます。 Python Textmining Packageから

：コードはCounter` `の使用を示さなかった

result_rows = list(tdm.rows(cutoff=1)) 
result_dict = {} 

for i in range(len(result_rows[0])): 
    result_dict[result_rows[0][i]] = result_rows[1][i]

出典

2017-08-11 09:24:36 Igle

このエラーが発生しました：iの範囲（len（result_rows [0]））： TypeError： 'ジェネレータ'オブジェクトに属性 '__getitem__'がありません.CSSファイルに出力しますか？ – aneeket

あなたはcsvファイルが実際に有効なcsvです。カウンターに表示する例は、dictのように見えます。私は自分の答えを更新し、tdm.rowsがリストではなくジェネレータであることを知らなかったので、バグを修正しました。 – Igle

@Igle – aneeket

Pythonで文書用語頻度行列を作成する方法

Instead of writing out the matrix you can also access its rows directly.

答えて

関連する問題