regexを使ってPythonでキーワードのリストに続く単語を抽出するには？

PythonでRegexを使って場所を抽出しようとしています。は今、私はこれをやっている：regexを使ってPythonでキーワードのリストに続く単語を抽出するには？

def get_location(s): 
    s = s.strip(STRIP_CHARS) 
    keywords = "at|outside|near" 
    location_pattern = "(?P<location>((?P<place>{keywords}\s[A-Za-z]+)))".format(keywords = keywords) 
    location_regex = re.compile(location_pattern, re.IGNORECASE | re.MULTILINE | re.UNICODE | re.DOTALL | re.VERBOSE) 

    for match in location_regex.finditer(s): 
     match_str = match.group(0) 
     indices = match.span(0) 
     print ("Match", match) 
     match_str = match.group(0) 
     indices = match.span(0) 
     print (match_str) 

get_location("Im at building 3")

私は3つの問題があります。それが唯一の出力として「で」与えているが、それはまた、建物を与える必要があります

を。
captures = match.capturesdict()これは他の例ではキャプチャを抽出するのに使用できません。
私はちょうどこれをやっていますlocation_pattern = 'at|outside\s\w+。それは働いているようだ。誰かが私が間違っていることを説明することはできますか？

出典

2017-10-12 user3667569

検索しているテキストの例を投稿することができます。 –

あなたの質問にいくつかの文字列を追加し、予想される出力は何ですか？ – mabe02

ここでの主な問題は、{keywords}を非キャプチャグループ内に配置する必要があることです。(?:{keywords})。ここでは概略的な例である：a|b|c\s+\w+マッチaまたはbまたはc + <whitespace(s)> + . When you put the alternation list into a group,どちらかは、\ S + \ + , it matches eitherワット, or B or C 'とだけ、それが空白、その後、単語文字を一致させようとします（| C | B）。

更新されたコード（demo online）を参照してください：

import regex as re 
def get_location(s): 
    STRIP_CHARS = '*' 
    s = s.strip(STRIP_CHARS) 
    keywords = "at|outside|near" 
    location_pattern = "(?P<location>((?P<place>(?:{keywords})\s+[A-Za-z]+)))".format(keywords = keywords) 
    location_regex = re.compile(location_pattern, re.IGNORECASE | re.UNICODE) 

    for match in location_regex.finditer(s): 
     match_str = match.group(0) 
     indices = match.span(0) 
     print ("Match", match) 
     match_str = match.group(0) 
     indices = match.span(0) 
     print (match_str) 
     captures = match.capturesdict() 
     print(captures) 

get_location("Im at building 3")

出力：atはどこにでもマッチしている、とoutsideは空白に従わなければならないので、location_pattern = 'at|outside\s\w+が機能していないことを

('Match', <regex.Match object; span=(3, 14), match='at building'>) 
at building 
{'place': ['at building'], 'location': ['at building']}

注意をし、単語の文字。あなたは同じ方法でそれを修正するかもしれません：(at|outside)\s\w+。

キーワードをグループに入れると、the captures = match.capturesdict()がうまくいきます（上記の出力を参照）。

出典

2017-10-12 07:28:51

regexは影響を受ける機能を一切使用しないので、 'MULTILINE'、' VERBOSE'と 'DOTALL'修飾子をデモから削除しました。 –

おかげさまで、「at」ではなく、建物だけを取得する方法はありますか？ – user3667569

@ user3667569：[このデモ]（http://rextester.com/RRX98021）を参照してください。 '[A-Za-z] +'部分をキャプチャカッコで囲み、それぞれの '.group（3）'または '.group（4）'（この例では外部番号付きキャプチャグループを削除した）値を取得します。 –

regexを使ってPythonでキーワードのリストに続く単語を抽出するには？

答えて

関連する問題