推測 - 優秀な図書館

の精度を取得し、私は現在、次のようにthis SO question 推測

に次のコードを使用して、単語のリストのpronounceabilityが見つけようとしている：

import random 
def scramble(s): 
    return "".join(random.sample(s, len(s))) 

words = [w.strip() for w in open('/usr/share/dict/words') if w == w.lower()] 
scrambled = [scramble(w) for w in words] 

X = words+scrambled 
y = ['word']*len(words) + ['unpronounceable']*len(scrambled) 

from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y) 

from sklearn.pipeline import Pipeline 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.naive_bayes import MultinomialNB 

text_clf = Pipeline([ 
    ('vect', CountVectorizer(analyzer='char', ngram_range=(1, 3))), 
    ('clf', MultinomialNB()) 
    ]) 

text_clf = text_clf.fit(X_train, y_train) 
predicted = text_clf.predict(X_test) 

from sklearn import metrics 
print(metrics.classification_report(y_test, predicted))

この出力は、ランダムな言葉でこの

>>> text_clf.predict("scaroly".split()) 
['word']

私はscikit documentationを確認していますが、入力語のスコアをどのように印刷するかはまだわかりません。

出典

2017-02-08 nadermx

あなたがで正確に何を意味する「スコア？」与えられた言葉が発音可能であるということは、どのように自信を持って分かりますか？ – blacksite

@not_a_robotはい – nadermx

はsklearn.pipeline.Pipeline.predict_probaをお試しください：

>>> text_clf.predict_proba(["scaroly"]) 
array([[ 5.87363027e-04, 9.99412637e-01]])

それは（この場合、"scaroly"で）与えられた入力を使用して、モデルを訓練している時にクラスに属する確率を返します。したがって、99.94％の確率で"scaroly"が発音可能です。逆に

、「新しい」可能性が発音できないですウェールズ語：

>>> text_clf.predict_proba(["newydd"]) 
array([[ 0.99666533, 0.00333467]])

出典

2017-02-08 01:48:49 blacksite

推測

答えて

関連する問題