Pythonと正規表現を使用してコンテンツがブロックごとに異なる場合、マルチラインブロックテキストを解析する方法は？

-1

私は解析する必要がある設定ファイルを持っています。そのアイデアは、pythonのgroupinsのおかげで、後で辞書に入れられます。Pythonと正規表現を使用してコンテンツがブロックごとに異なる場合、マルチラインブロックテキストを解析する方法は？

私が直面している問題は、テキストのすべてのブロックのすべての行がまったく同じではないことです。私の正規表現は、行の多いブロックではこれまでに働いていましたが、いくつかの "set"行がいくつかのブロックでインスタンスのために省略されている場合、どのようにして複数行にマッチしますか？

私はこれを介して動作するために、真/偽の書類を正規表現を分割している場合に使用する必要があり、ELSIFか？ pythonic imhoのようには見えません。
私はかなり私の大きな正規表現を分割して順番にそれを働かなければならないと確信していますか？ trueの場合は... else正規表現に一致する正規表現にスキップします。
編集するすべてのブロックを別々にパースするリスト要素に入れることを考えていましたか？または、私はちょうど1つですべてのことをすることができますか？

私はいくつかのアイデアを持っていますが、私はそれをしてくださいsomのpythonicの方法をしたいです。

いつものように、あなたの助けが大変ありがとうございます。ありがとうございます

TEXT、ここで一致するブロックは編集から次のものです。

edit "port11" 
    set vdom "ACME_Prod" 
    set vlanforward enable 
    set type physical 
    set device-identification enable 
    set snmp-index 26 
next 
edit "port21" 
    set vdom "ACME_Prod" 
    set vlanforward enable 
    set type physical 
    set snmp-index 27 
next 
edit "port28" 
    set vdom "ACME_Prod" 
    set vlanforward enable 
    set type physical 
    set snmp-index 28 
next 
edit "port29" 
    set vdom "ACME_Prod" 
    set ip 174.244.244.244 255.255.255.224 
    set allowaccess ping 
    set vlanforward enable 
    set type physical 
    set alias "Internet-IRISnet" 
    set snmp-index 29 
next 
edit "port20" 
    set vdom "root" 
    set ip 192.168.1.1 255.255.255.0 
    set allowaccess ping https ssh snmp fgfm 
    set vlanforward enable 
    set type physical 
    set snmp-index 39 
next 
edit "port25" 
    set vdom "root" 
    set allowaccess fgfm 
    set vlanforward enable 
    set type physical 
    set snmp-index 40 
next

コードスニペット：

data = {} 
with open(file, 'r') as fileopen: 
    for line in fileopen: 
     words = line.strip().split() 
     if words[0] == 'edit': # Create a new block 
      curr = data.setdefault(words[1].strip('"'), {}) 
     elif words[0] == 'set': # Write config to block 
      curr[words[1]] = words[2].strip('"') if len(words) == 3 else words[2:] 
print(data)

出力を：解析するのに非常に単純な構造だとき

import re, pprint 
file = "interfaces_2016_10_12.conf" 

try: 
    """ 
    fileopen = open(file, 'r') 
    output = open('output.txt', 'w+') 
except: 
    exit("Input file does not exist, exiting script.") 

#read whole config in 1 go instead of iterating line by line 
text = fileopen.read() 

# my verbose regex, verbose so it is more readable ! 

pattern = r'''^     # use r for multiline usage 
\s+edit\s\"(.*)\"\n   # group(1) match int name 
\s+set\svdom\s\"(.*)\"\n  # group(2) match vdom name 
\s+set\sip\s(.*)\n   # group(3) match interface ip 
\s+set\sallowaccess\s(.*)\n # group(4) match allowaccess 
\s+set\svlanforward\s(.*)\n # group(5) match vlanforward 
\s+set\stype\s(.*)\n   # group(6) match type 
\s+set\salias\s\"(.*)\"\n  # group(7) match alias 
\s+set\ssnmp-index\s\d{1,3}\n # match snmp-index but we don't need it 
\s+next$'''     # match end of config block 

regexp = re.compile(pattern, re.VERBOSE | re.MULTILINE) 

For multiline regex matching use finditer(): 
""" 
z = 1 
for match in regexp.finditer(text): 
    while z < 8: 
     print match.group(z) 
     z += 1 

fileopen.close() #always close file 
output.close() #always close file

出典

2016-10-31 bennethos

なぜregexを使用しないで、すべてのブロックが同じ "設定" の文が含まれています。

{'port11': {'device-identification': 'enable', 
    'snmp-index': '26', 
    'type': 'physical', 
    'vdom': 'ACME_Prod', 
    'vlanforward': 'enable'}, 
'port20': {'allowaccess': ['ping', 'https', 'ssh', 'snmp', 'fgfm'], 
    'ip': ['192.168.1.1', '255.255.255.0'], 
    'snmp-index': '39', 
    'type': 'physical', 
    'vdom': 'root', 
    'vlanforward': 'enable'}, 
    ...

出典

2016-10-31 20:21:22 AChampion

本当に非常にpythonicです。ありがとうございます！私は、必ずしも同じブロックではない8000行のテキストを通過する必要があるという事実を無視したかもしれません。しかし、それらの特定の複数行ブロックだけを含むテキストファイルを作成すると、あなたのスクリプトは正常に動作します。ブロック上で一致させるために非常に小さな正規表現を使用し、次に構造を使って行を1行ずつ解析する必要がありますか？ – bennethos

どの程度：

config = {} 
for block in re.split('\nnext\n',open('datafile'): 
    for cmd in block.split("\n"): 
     cmd = cmd.strip().split() 
     if cmd[0] == 'edit': 
      current = cmd[1] 
      config[current] = {} 
      continue 
     config[current][cmd[1]] = cmd[2]

私は、それが読めるのだと思うが、私は（何の正規表現）考えていないように、他の答えが好ましいです。 Upvotedそれ。

出典

2016-10-31 20:31:47 kabanus

Pythonと正規表現を使用してコンテンツがブロックごとに異なる場合、マルチラインブロックテキストを解析する方法は？

答えて

関連する問題