PythonのNumpyのみを使用してリストからストップワードを削除する

私は、numpyだけを使ってPythonでストップワードを削除する作業を進めています。ストップワードファイルはリストとしてインポートされます。だからここに私が思い付いたものです：PythonのNumpyのみを使用してリストからストップワードを削除する

方法1、私はストップワードリストをループしてみてください、とtw_line

# loop through the stop words list, and remove each one from the splitted line list 
for line in stopwords: 
    if line in words: 
     words.remove(line) 
     continue 
    print (tw_line)

結果から皆を削除：NOストップワードが削除されていません。

0 my whole body feels itchy and like its on fire

方法2、私は結果

# loop through the line, and check with stop words, if not in stop words, add to clean_line 
clean_line=[] 
tw_line.split(" ") 
    for line in tw_line: 
     if line in stopwords: 
      clean_line.append(line)    
    print(clean_line)

、ストップワードリストをループする単語を試してみてください：すべての単語は文字

['m', 'y', 'w', 'h', 'o', 'l', 'e', 'b', 'o', 'd', 'y', 'f', 'e', 'e', 'l', 's', 'i', 'c', 'h', 'y', 'a', 'n', 'd', 'l', 'i', 'k', 'e', 'i', 's', 'o', 'n', 'f', 'i', 'r', 'e']

すべてのヘルプに分かれていますか？

>>> str1 = "my whole body feels itchy and like its on fire" 
>>> str1.split() 
['my', 'whole', 'body', 'feels', 'itchy', 'and', 'like', 'its', 'on', 'fire'] 
>>>

そしてストップワードにある単語を削除します。

出典

2017-02-12 Frank Hee

質問は何ですか？そして、 'numpy'はこれにどのように関係していますか？データがどのようなものかの例を含めると便利です。 –

メソッド2で使用している '.split'メンバ関数が"適切な場所 "で動作していない（どのようにすればよいでしょうか？それは新しい型（文字列からのリスト）を生成しています）、戻り値を 'tw_line'または新しい変数に_assign_する必要があります。 –

numpyが使用できる唯一のlibです...私は他のlibsの組み込みメソッドを使用することはできません。 –

はこれを適用してみます。ところで、私はここに何も気がしません。

出典

2017-02-12 01:12:02

wordは、ストップワードを削除した場所なので、not tw_lineという単語を出力してください。

for line in stopwords: 
if line in words: 
    words.remove(line) 
    continue 
print (words)

出典

2017-02-12 01:44:16

方法2は、あなたがしたいことは明らかです。しかし、いくつかのものがありますが改善することができます：あなたはあなたがリストの内包を使用することではなく、可能性が tw_list = tw_line.split(" ")

を行う必要があるので、

ポール装甲が述べたように、splitが所定の位置に動作しません。ルーピング（または後でjoinに向かう場合はジェネレータ）。 clean_line = [word for word in tw_list if word not in stopwords]
あなたのコードコメントから、stopwordsがリストであることがわかりました。効率の理由からセットにすることもできます（https://wiki.python.org/moin/TimeComplexity）。

出典

2017-02-12 11:15:38

私はそれを試みました、それはそれ自身で働いた。私がそれを機能させると、行は奇妙な結果をもたらしました： 'print（stopwords [：10]）['a'、 'able'、 'about'、 'above'、 'abst' '、' across '、' acts ']' –

def remove_stopwords（tw）：open（ 'stop_words.txt'）をfにして： stopwords = f.readlines（）をインデックスに列挙する（ストップワード）：行=行。ストリップ（ 'の\ n'）ストップワード[インデックス] =行インデックスの remove_punc（ストップワード）、列挙（TW）のライン： clean_line = [] clean_line = [行の単語の単語の単語でない場合のストップワードで] line = string.join（clean_line） tw [index] =行＃ストアラインtwに戻る return tw [：5] ' –

' ['0 tttttt 2 1 A tt Y t DCTD tt D \ n'、 '0 ttttt F ttttt S t B \ n'、 '0 KI tt M t 5 0 T tt \ n'、 '0 tt \ n'、 '0 tttt I tt \ n'] ' –

PythonのNumpyのみを使用してリストからストップワードを削除する

答えて

関連する問題