私が言及した例文からbigramsとtrigramsを取得したいと思います。Gensimを使ってトリグラムを取得する際の問題
私のコードはバイグラムでうまく動作します。しかし、データ内のトリグラム(例えば、私の文章の5カ所に記載されている人間のコンピュータのやりとり)は捕捉されません。以下は、Gensimのフレーズを使用した私のコードです。
from gensim.models import Phrases
documents = ["the mayor of new york was there", "human computer interaction and machine learning has now become a trending research area","human computer interaction is interesting","human computer interaction is a pretty interesting subject", "human computer interaction is a great and new subject", "machine learning can be useful sometimes","new york mayor was present", "I love machine learning because it is a new subject area", "human computer interaction helps people to get user friendly applications"]
sentence_stream = [doc.split(" ") for doc in documents]
bigram = Phrases(sentence_stream, min_count=1, threshold=1, delimiter=b' ')
trigram = Phrases(bigram_phraser[sentence_stream])
for sent in sentence_stream:
bigrams_ = bigram_phraser[sent]
trigrams_ = trigram[bigrams_]
print(bigrams_)
print(trigrams_)
アプローチ2私も両方Phraserやフレーズを使用しようとしましたが、それはうまくいきませんでした。
from gensim.models import Phrases
from gensim.models.phrases import Phraser
documents = ["the mayor of new york was there", "human computer interaction and machine learning has now become a trending research area","human computer interaction is interesting","human computer interaction is a pretty interesting subject", "human computer interaction is a great and new subject", "machine learning can be useful sometimes","new york mayor was present", "I love machine learning because it is a new subject area", "human computer interaction helps people to get user friendly applications"]
sentence_stream = [doc.split(" ") for doc in documents]
bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ')
bigram_phraser = Phraser(bigram)
trigram = Phrases(bigram_phraser[sentence_stream])
for sent in sentence_stream:
bigrams_ = bigram_phraser[sent]
trigrams_ = trigram[bigrams_]
print(bigrams_)
print(trigrams_)
トリグラムを取得するこの問題を解決するのを手伝ってください。
私はGensimのexample documentationに従っています。
あなたの非常に貴重な答えに感謝します。乾杯! :)ところで、私にはあまり明確ではないので、「しきい値」の値がどうなるか教えてください。 –
あなたは大歓迎です!はい、私は答えを編集しました、うまくいけば今それは少し明確です。 – stjernaluiht
ありがとう!非常に便利な答えが見つかりました:) –