文字列からdictを複数抽出するにはどうすればよいですか？

私は、文字列から複数のpython dictを抽出しようとしています。現在、私は正規表現の間のデータにもマッチするので、失敗している正規表現を使用しています。私も非貪欲な正規表現({.+?})を使用しましたが、それは入れ子になった辞書を混乱させ、それらを異なる出現としてみなします。文字列からdictを複数抽出するにはどうすればよいですか？

例文字列：

mystring = '(2017-05-29, { "mydict": [{ "hello": "world"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}'

コード：

>>>import re 
>>>match_data = re.compile('({.+})') 
>>>match_data.findall(mystring.strip()) 
['{ "mydict": [{ "hello": "world"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}']

予想される出力：

['{ "mydict": [{ "hello": "world"}, {"hello2":"world2"}]}', '{"dict2":{"world":"hello"}}']

出典

2017-05-29 Rahul

私はあなたがPythonの辞書のためのパーサを書く必要があると思います。 – 0605002

これを 're.findall（r '{。+？}'、mystring））'しようとすると、正確に何を除いているのか分かりませんが、データを簡単に解析できます。 – Arun

は ";;/url/string"のデータは常に同じ場所に来ますか？ 2つのdictの間のように?? – DexJ

正規表現は、おそらくこの問題のために単純すぎます。しかし、一つの可能な解決策はparathesesを一致させることです。その結果

s = '{ "mydict": [{ "hello": "wo}}rld"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}' 


number_of_parthesis = 0 
start_index = -1 
in_quotes = False 

for i,c in enumerate(s): 
    if c in ["\'", "\""]: 
     if in_quotes: 
      in_quotes = False 
     else: 
      in_quotes = True 
    if in_quotes: 
     continue 
    if c == "{": 
     number_of_parthesis += 1 
     if start_index == -1: 
      start_index = i 
    if c == "}": 
     number_of_parthesis -= 1 
     if number_of_parthesis == 0: 
      print(s[start_index:i+1]) 
      start_index = -1

：

{ "mydict": [{ "hello": "wo}}rld"}, {"hello2":"world2"}]} 
{"dict2":{"world":"hello"}}

出典

2017-05-29 05:00:18 Darkstarone

文字列からdictを複数抽出するにはどうすればよいですか？

答えて

関連する問題