2016-10-29 4 views
1

私はgensimからdoc2vecを実装しようとしていますが、いくつかのエラーがあり、Web上で十分なドキュメンテーションやヘルプがありません。ここ は私の作業のコードの一部です:Pythonのdoc2vecの簡単な実装ですか?

from gensim.models import Doc2Vec 
from gensim.models.doc2vec import LabeledSentence 

class LabeledLineSentence(object): 
    def __init__(self, filename): 
     self.filename = filename 
    def __iter__(self): 
     with open(self.filename, 'r') as f: 
      for uid, line in enumerate(f): 
       print LabeledSentence(line.split(), tags=['TXT_%s' % uid]) 
       yield LabeledSentence(words=line.split(), tags=['TXT_%s' % uid]) 

sentences = LabeledLineSentence('myfile.txt') 

私のtxtファイルにはどのようなものか:モデルのinit

1 hi how are you 
    2 hi how are you 
    3 hi how are you 
    4 its such a great day 
    5 its such a great day 
    6 its such a great day 
    7 i like dogs 
    8 i like cats 
    9 i like snakes 
10 the ice cream was yummy 
11 the cake was awesome 

model = Doc2Vec(alpha=0.025, min_alpha=0.025, size=50, window=5, min_count=5, 
       dm=1, workers=8, sample=1e-5)  

例のプリント出力:

LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_0']) 
LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_1']) 
LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_2']) 
LabeledSentence(['its', 'such', 'a', 'great', 'day'], ['TXT_3']) 
LabeledSentence(['its', 'such', 'a', 'great', 'day'], ['TXT_4']) 

これは、 eエラーは:

for epoch in range(500): 
    try: 
     print 'epoch %d' % (epoch) 
     model.train(sentences) 
     model.alpha *= 0.99 
     model.min_alpha = model.alpha 
    except (KeyboardInterrupt, SystemExit): 
     break 

RuntimeError: you must first build vocabulary before training the model 

何故でしょうか?

答えて

関連する問題