2017-09-25 11 views
0

私は予想される出力を得るためにNaive Bayesを実行しているモデルを作成しました。Pythonの素朴なベイを使ったテキストの分類

from textblob.classifiers import NaiveBayesClassifier as NBC 
from textblob import TextBlob 
training_corpus = [ 
('Agree Completely Agree Strongly Agree Somewhat Disagree Somewhat Disagree Strongly Completely Disagree','TRUE'), 
('Concerned 2 3 4 5 6 7 - Comfortable','TRUE'), 
('1 - disagree strongly 2 - disagree somewhat 3 - neither agree nor disagree 4 - agree somewhat 5 - agree strongly','TRUE'), 
('1 - doesn\'t apply at all 2 3 4 5 6 7 - applies completely','TRUE'), 
('1 - extremely new and different 2 3 4 5 6 7 - not at all new & different','TRUE'), 
('1 - extremely relevant 2 3 4 5 6 7 - not at all relevant','TRUE'), 
('1 - I don\'t want brands to engage with me at all on social media 2 3 4 5 6 7 - I love to engage with brands on social media','TRUE'), 
    ('1 - Most Important 2 3 4 5 - Least Important','TRUE'),  
    ('pepsi','FALSE'), 
    ('coca cola','FALSE'), 
    ('hyundai','FALSE'),   
    ('Audio quality','FALSE'), 
    ('Product features ','FALSE'), 
    ('Content ','FALSE') 
] 
test_corpus = [ 
    ('1 - Agree Completely 2 - Agree Strongly 3 - Agree Somewhat 4 - Disagree Somewhat 5 - Disagree Strongly 6 - Completely Disagree','TRUE'), 
    ('1 - Concerned 2 3 4 5 6 7 - Comfortable','TRUE'), 
    ('Content ','FALSE'), 
    ('Ease of navigation','FALSE') 
] 
model = NBC(training_corpus) 
print(model.classify('pepsi')) 
print(model.accuracy(test_corpus)*100) 

私はこのコードを実行すると、それは100%の効率を示すが、すべての時間とするためにFALSEを返すされています。私は何が間違っているのか分からないが、それは期待される結果ではない。

答えて

0

あなたのモデルは大丈夫です。あなたのデータと分類器だけです。

def test(s): 
    prob_dist = model.prob_classify(s) 
    print("classifiying", s) 
    print("possibility of being FALSE:", round(prob_dist.prob("FALSE"), 2), 
      "possibility of being TRUE:" ,round(prob_dist.prob("TRUE"), 2)) 
    print('-'*70) 

test_cases = ['1', '1 - ', '2', '2 3 4 5', '1- 2 3 4 5', 'pepsi', 'coca', 'BMW'] 
for tc in test_cases: 
    test(tc) 
今ここ

出力、それは、OK

classifiying 1 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 1 - 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 2 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 2 3 4 5 
possibility of being FALSE: 0.05 possibility of being TRUE: 0.95 
---------------------------------------------------------------------- 
classifiying 1- 2 3 4 5 
possibility of being FALSE: 0.0 possibility of being TRUE: 1.0 
---------------------------------------------------------------------- 
classifiying pepsi 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying coca 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying BMW 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
-------------------------------------------------------------------- 

かなり良いです、今あなたは、なぜ知りたいさ:
は、私はあなたが提供するトレーニングデータの意味、それは良い作品、のビットをテストしてみましょうクラシファイアはこのように動作しますか? あなたのコードを見て、どこに特徴ベクトルを挙げましたか?どこにもないので、特徴ベクトルを抽出するためのデフォルト関数をexplained hereとして使用します。

model.show_informative_features() 


>>> Most Informative Features 
      contains(4) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(3) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(5) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(2) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(1) = False   FALSE : TRUE =  3.3 : 1.0 
      contains(7) = False   FALSE : TRUE =  2.4 : 1.0 
      contains(6) = False   FALSE : TRUE =  2.4 : 1.0 
      contains(at) = False   FALSE : TRUE =  1.9 : 1.0 
      contains(all) = False   FALSE : TRUE =  1.9 : 1.0 
      contains(not) = False   FALSE : TRUE =  1.3 : 1.0 
+1

はあなたイマンをありがとう...私はそれに取り組んでいますし、そこかどうかを知るようになる:あなたのモデルの特徴は次のように見ることができ、たとえば

(あなたはsource codeを見て見ることができます)どんな質問でもあります。 –

+0

あなたは大歓迎です:) –

関連する問題