2017-10-17 11 views
0

私は以下のリストを持っています。Pythonのリストでの正規表現の部分文字列一致のエラー

mylist = [["the", "and" "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]] 

は今、私は私の新しいリストは以下のようになるように、mylistでストップワードを削除します。

mylist = [["fresh milk", "loaf bread", "butter"], ["apple", "eggs", "oranges", "cup tea"]] 

私の現在のコードは以下の通りです。

cleaned_mylist= [] 
stops = ['a', 'an', 'of', 'the'] 
pattern = re.compile(r'|'.join([r'(\s|\b){}\b'.format(x) for x in stops])) 
for item in mylist: 
    inner_list= [] 
    for words in item: 
     inner_list.append(pattern.sub('', item).strip()) 
    cleaned_mylist.append(inner_list) 

ただし、コードは機能していないようです。私を助けてください。

+0

あなたは、コードが動作していないと言うとき、あなたは何を意味するのですか?何が起こっている? –

答えて

1

であなたは、この例では正規表現を使用する必要はありませんありません。

mylist = [["the", "and", "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]] 
expected = [["fresh milk", "loaf bread", "butter"], ["apple", "eggs", "oranges", "cup tea"]] 

cleaned_mylist= [] 
stops = ['a', 'an', 'of', 'the', 'and'] 
for item in mylist: 
    inner_list= [] 
    for sentence in item: 
     out_sentence = [] 
     for word in sentence.split(): 
      if word not in stops: 
       out_sentence.append(word) 
     if len(out_sentence) > 0: 
      inner_list += [' '.join(out_sentence)] 
    cleaned_mylist.append(inner_list) 

print expected == cleaned_mylist 
# True 
0

あなたのパターンはサブリスト(アイテム)とのマッチングされ、言葉

mylist = [["the", "and","fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]] 
cleaned_mylist= [] 
stops = ['a', 'an', 'of', 'the','and'] 
pattern = re.compile(r'|'.join([r'(\s|\b){}\b'.format(x) for x in stops])) 
for item in mylist: 
    inner_list= [] 
    for words in item: 
     if pattern.sub('', words).strip() != '': 
      inner_list.append(pattern.sub('', words).strip()) 
    cleaned_mylist.append(inner_list) 
-1

使用if not

import re 
mylist = [["the", "and", "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]] 
cleaned_mylist= [] 
stops = ['a', 'an', 'of', 'the','and'] 
pattern = '|'.join([r'\b{}\b\s?'.format(x) for x in stops]) 
for item in mylist: 
    inner_list= [] 
    for words in item: 
     words = re.sub(pattern,'',words) 
     if(words != ""): 
      inner_list.append(words) 
    cleaned_mylist.append(inner_list) 

print cleaned_mylist 
+0

これは期待される出力を生成しません。 –

関連する問題