このコードでストップワードをエリートする方法は？

私は感情分析を行うためのコードを書いています。したがって、文章は否定または否定のタグである2つの異なる辞書を使用します。私のコードスニペットは、これまでのところ、次のようになりますこのコードでストップワードをエリートする方法は？

def format_sentence(sentence): 
    return {word: True for word in word_tokenize(satz) } 

pos_data = [] 
with open('Positiv.txt') as f: 
    for line in f: 
     pos_data.append([format_sentence(line), 'pos']) 

neg_data = [] 
with open('Negativ.txt') as f: 
    for line in f: 
     neg_data.append([format_sentence(line), 'neg']) 

training_data = pos_data[:3] + neg_data[:3] 
test_data = pos_data[3:] + neg_data[3:] 

model = NaiveBayesClassifier.train(training_data)

今私は、辞書内の文章からすべてのストップワードをelimateするコードをしたいと思いますが、私は初心者です、私は自分のコードの中にそれを実装する方法がわかりませんPythonプログラミングで。私は誰もがこれで私を助けることができれば、あなただけのpythonのリストを使用している場合は、コードのこのテンプレートを試す

出典

2016-04-13 Tommy5

「ストップワード」とはどのように定義されていますか？「除去」はどのように定義されていますか？ – th3an0maly

ストップワードは 'や'、 'but'などのような単語です。クラシファイアがトレーニングデータにこれらの種類の単語を含めないようにします – Tommy5

[NLTKによるストップワード除去]の可能な複製（http://stackoverflow.com/questions/19130512/stopword-removal-with-nltk） – alvas

:)非常に感謝して削除されたストップワードで新しいリストを作成するでしょう：

list_without_stopwords = [word for word in original_list if word not in stopword_list]

出典

2016-04-13 14:25:07 mrEvgenX

に見えますNLTKでNaive Bayes Classifierの実装を使用しているようです。 NLTKには、いくつかの言語のストップワードリストも組み込まれています。

from nltk.corpus import stopwords 
stops = stopwords.words('english') 

def format_sentence(sentence): 
    return {word: True for word in word_tokenize(sentence) if word not in stops}

出典

2016-04-13 15:11:58 aberger

このコードでストップワードをエリートする方法は？

答えて

関連する問題