Python NLTKでベーダーの化合物の極性スコアはどのように計算されますか？

私はVader SentimentAnalyzerを使用して極性スコアを取得しています。以前はポジティブ/ネガティブ/ニュートラルの確率スコアを使用していましたが、-1（最もネガティブ）から1（最もポジティブ）までの範囲の「化合物」スコアが単一の尺度になりました。私はどのように "化合物"のスコアが計算されたのだろうか。それは[pos、neu、neg]ベクトルから計算されますか？Python NLTKでベーダーの化合物の極性スコアはどのように計算されますか？

出典

2016-10-30 alicecongcong

コードがですnltk/sentiment/vader.py – alvas

VADERアルゴリズム出力感情https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L441の4クラスに感情スコア：

neg：負
neu：ニュートラル
pos：正
compound：化合物（すなわち、凝集スコア）

compound = normalize(sum_s)

normalize()機能はhttps://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L107でそのような次のように定義されます：

def normalize(score, alpha=15): 
    """ 
    Normalize the score to be between -1 and 1 using an alpha that 
    approximates the max expected value 
    """ 
    norm_score = score/math.sqrt((score*score) + alpha) 
    return norm_score

だからハイパーパラメータがありますのコードを見てみましょう、化合物の最初のインスタンスは、それが計算https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L421、ですalpha。

sum_sとしては、それはhttps://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L413

score_valence()関数に渡された感情の引数の合計であり、我々はこのsentiment引数をトレースバックあれば、我々はhttps://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L217でpolarity_scores()関数を呼び出すときに、それが計算されますことを参照してください。

def polarity_scores(self, text): 
    """ 
    Return a float for sentiment strength based on the input text. 
    Positive values are positive valence, negative value are negative 
    valence. 
    """ 
    sentitext = SentiText(text) 
    #text, words_and_emoticons, is_cap_diff = self.preprocess(text) 

    sentiments = [] 
    words_and_emoticons = sentitext.words_and_emoticons 
    for item in words_and_emoticons: 
     valence = 0 
     i = words_and_emoticons.index(item) 
     if (i < len(words_and_emoticons) - 1 and item.lower() == "kind" and \ 
      words_and_emoticons[i+1].lower() == "of") or \ 
      item.lower() in BOOSTER_DICT: 
      sentiments.append(valence) 
      continue 

     sentiments = self.sentiment_valence(valence, sentitext, item, i, sentiments) 

    sentiments = self._but_check(words_and_emoticons, sentiments)

polarity_scores機能を見てみると、何のことはやっていることは目を割り当てるためのルールベースのsentiment_valence()機能付き全SentiText辞書とチェックを反復処理することです感情https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L243に電子価のスコアは、セクション2.1.1を参照してくださいhttp://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

だから、化合物のスコアに戻って、我々はそれを参照してください。

compoundスコアは正規化されたsum_sのスコアと
ですsum_sは、いくつかのヒューリスティックと感情辞書（aka。）に基づいて計算された価数の合計です。感情強度）と
正規化されたスコアは、正規化関数の分母を増加させるアルファパラメータとその平方で除算された単純にsum_sです。

[POS、NEU、NEG]ベクトルから計算されたということですか？

そうでもない=）

我々はscore_valence機能https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L411を見てみるならば、我々は、POS、NEGとNEUスコアが計算_sift_sentiment_scores()を使用して計算される前に、化合物のスコアはsum_sで計算されていることを確認invidiual pos、negおよびneuのスコアは合計なしでsentiment_valence()の生スコアを使用しています。

alpha=0：

（拘束されないままにした場合）

私たちは、このalpha mathemagicを見ている場合は、alphaの値に応じて、正規の出力はかなり不安定であると思われます