削除するストップワードとstring.punctuation

この作品はありませんなぜ私は考え出したことはできません。削除するストップワードとstring.punctuation

import nltk 
from nltk.corpus import stopwords 
import string 

with open('moby.txt', 'r') as f: 
    moby_raw = f.read() 
    stop = set(stopwords.words('english')) 
    moby_tokens = nltk.word_tokenize(moby_raw) 
    text_no_stop_words_punct = [t for t in moby_tokens if t not in stop or t not in string.punctuation] 

    print(text_no_stop_words_punct)

私はこれを持って、出力を見て：

[...';', 'surging', 'from', 'side', 'to', 'side', ';', 'spasmodically', 'dilating', 'and', 'contracting',...]

は句読点があると思われますまだそこにいる。私は間違っているの？

出典

2017-08-04 Lime In The Coconut

それはand、ないorでなければなりません：

if t not in stop and t not in string.punctuation

または：

if not (t in stop or t in string.punctuation):

または：

all_stops = stop | set(string.punctuation) 
if t not in all_stops:

後者の溶液が最速です。

出典

2017-08-04 22:21:23 DyZ

この行の変更では、 'または'を 'と'と 'に変更すると、リストにはストップワードでなく句読記号でない単語のみが返されます。

text_no_stop_words = [t for t in moby_tokens if t not in stop or t not in string.punctuation]

出典

2017-08-04 22:21:05 vealkind

閉じる。 andではなくorを使用する必要があります。 ";"のような句読点がある場合は、 stopに入っていない場合は、string.punctuationにpythonがチェックされません。

text_no_stop_words_punct = [t for t in moby_tokens if t not in stop and t not in string.punctuation]

出典

2017-08-04 22:24:20

削除するストップワードとstring.punctuation

答えて

関連する問題