Pythonのセットアップファイルを解析するためのAntlr

私はPythonのsetup.pyファイルを解析してそこから情報を抽出するJavaプログラムを持っています。私は何かが働いているが、壁に当たった。私は単純な生ファイルから始めます。一度実行すれば、実際のファイルを反映させたくないノイズを取り除くことができます。Pythonのセットアップファイルを解析するためのAntlr

（私は通常のファイルからノイズを除去すると、次のステップである、言ったように）だからここにどのように私はこだわっています

setup(
    setup_requires=['pytest-runner'], 
    tests_require=['pytest', 'unittest2'], 
)

がある

grammar SetupPy ; 

file_input: (NEWLINE | setupDeclaration)* EOF; 

setupDeclaration : 'setup' '(' method ')'; 
method : setupRequires testRequires; 
setupRequires : 'setup_requires' '=' '[' LISTVAL* ']' COMMA; 
testRequires : 'tests_require' '=' '[' LISTVAL* ']' COMMA; 

WS: [ \t\n\r]+ -> skip ; 
COMMA : ',' -> skip ; 
LISTVAL : SHORT_STRING ; 

UNKNOWN_CHAR 
: . 
; 

fragment SHORT_STRING 
: '\'' (STRING_ESCAPE_SEQ | ~[\\\r\n\f'])* '\'' 
| '"' (STRING_ESCAPE_SEQ | ~[\\\r\n\f"])* '"' 
; 

/// stringescapeseq ::= "\" <any source character> 
fragment STRING_ESCAPE_SEQ 
: '\\' . 
| '\\' NEWLINE 
; 

fragment SPACES 
: [ \t]+ 
; 

NEWLINE 
: ({atStartOfInput()}? SPACES 
    | ('\r'? '\n' | '\r' | '\f') SPACES? 
    ) 
    { 
    String newLine = getText().replaceAll("[^\r\n\f]+", ""); 
    String spaces = getText().replaceAll("[\r\n\f]+", ""); 
    int next = _input.LA(1); 
    if (opened > 0 || next == '\r' || next == '\n' || next == '\f' || next == '#') { 
     // If we're inside a list or on a blank line, ignore all indents, 
     // dedents and line breaks. 
     skip(); 
    } 
    else { 
     emit(commonToken(NEWLINE, newLine)); 
     int indent = getIndentationCount(spaces); 
     int previous = indents.isEmpty() ? 0 : indents.peek(); 
     if (indent == previous) { 
     // skip indents of the same size as the present indent-size 
     skip(); 
     } 
     else if (indent > previous) { 
     indents.push(indent); 
     emit(commonToken(Python3Parser.INDENT, spaces)); 
     } 
     else { 
     // Possibly emit more than 1 DEDENT token. 
     while(!indents.isEmpty() && indents.peek() > indent) { 
      this.emit(createDedent()); 
      indents.pop(); 
     } 
     } 
    } 
    } 
;

と私の現在のテストファイル私の文法ですsetup_requiresとtests_requiresに配列が含まれていることをantlrに伝えます。これらの配列の値は、誰かが一重引用符、二重引用符、異なる行の各値、および上記のすべての組み合わせを使用しても関係ありません。私はそれを引き出す方法を知らない。助けてもらえますか？多分1つまたは2つの例？

私はjythonのを使用し、単にファイルを実行することはできません注意してください観光、

ありません。
正規表現が原因このファイルこの問題の後

そしてもちろんのための開発者のスタイルに大きな変化にオプションではありません、私はまだ通常のファイルからノイズを除去する方法を理解する必要があります。私はこれを行うにはPython3の文法を使ってみましたが、私はantlrの初心者で、私を吹き飛ばしました。私は値を引き出すルールを書く方法を理解できなかったので、はるかに単純な文法を試してみることにしました。すぐに別の壁に当たった。

edit 最終的に解析する必要がある実際のsetup.pyファイルです。 setup_requiresとtest_requiresが存在していてもいなくてもよく、その順序であってもなくても構わないことに留意してください。

# -*- coding: utf-8 -*- 
from __future__ import with_statement 

from setuptools import setup 


def get_version(fname='mccabe.py'): 
    with open(fname) as f: 
     for line in f: 
      if line.startswith('__version__'): 
       return eval(line.split('=')[-1]) 


def get_long_description(): 
    descr = [] 
    for fname in ('README.rst',): 
     with open(fname) as f: 
      descr.append(f.read()) 
    return '\n\n'.join(descr) 


setup(
    name='mccabe', 
    version=get_version(), 
    description="McCabe checker, plugin for flake8", 
    long_description=get_long_description(), 
    keywords='flake8 mccabe', 
    author='Tarek Ziade', 
    author_email='[email protected]', 
    maintainer='Ian Cordasco', 
    maintainer_email='[email protected]', 
    url='https://github.com/pycqa/mccabe', 
    license='Expat license', 
    py_modules=['mccabe'], 
    zip_safe=False, 
    setup_requires=['pytest-runner'], 
    tests_require=['pytest'], 
    entry_points={ 
     'flake8.extension': [ 
      'C90 = mccabe:McCabeChecker', 
     ], 
    }, 
    classifiers=[ 
     'Development Status :: 5 - Production/Stable', 
     'Environment :: Console', 
     'Intended Audience :: Developers', 
     'License :: OSI Approved :: MIT License', 
     'Operating System :: OS Independent', 
     'Programming Language :: Python', 
     'Programming Language :: Python :: 2', 
     'Programming Language :: Python :: 2.7', 
     'Programming Language :: Python :: 3', 
     'Programming Language :: Python :: 3.3', 
     'Programming Language :: Python :: 3.4', 
     'Programming Language :: Python :: 3.5', 
     'Programming Language :: Python :: 3.6', 
     'Topic :: Software Development :: Libraries :: Python Modules', 
     'Topic :: Software Development :: Quality Assurance', 
    ], 
)

デバッグしようと簡素化し、私は方法、単に値を見つける必要はありません実現。だから私はこの文法で遊んでいます

grammar SetupPy ; 

file_input: (ignore setupRequires ignore | ignore testRequires ignore)* EOF; 

setupRequires : 'setup_requires' '=' '[' dependencyValue* (',' dependencyValue)* ']'; 
testRequires : 'tests_require' '=' '[' dependencyValue* (',' dependencyValue)* ']'; 

dependencyValue: LISTVAL; 

ignore : UNKNOWN_CHAR? ; 

LISTVAL: SHORT_STRING; 
UNKNOWN_CHAR: . -> channel(HIDDEN); 

fragment SHORT_STRING: '\'' (STRING_ESCAPE_SEQ | ~[\\\r\n\f'])* '\'' 
| '"' (STRING_ESCAPE_SEQ | ~[\\\r\n\f"])* '"'; 

fragment STRING_ESCAPE_SEQ 
: '\\' . 
| '\\' 
;

いいえ、シンプルなもののためにうまく動作し、順序の問題を処理します。しかし、完全なファイル上のdoesntの仕事は、それが

def get_version(fname='mccabe.py'):

にハングアップしますが、その行に等号。

出典

2017-07-16 scphantm

私のソリューションを評価する機会を得ましたか？ – TomServo

私はついにこのことを知りました。残念ながらそれは実際のファイルで分解されます。それはimport文をピックアップして、すべて変わって行きます。私はそれを解析する必要がある実際のファイルの例を投稿しました。私があきらめる前にこれをもう少し長く演奏するつもりです。私はそれに時間を割いています。 – scphantm

はい、それはかなり解析するだけですが、あなたのUNKNOWN_CHARシンボルは問題があります。暗黙のレクサートークンではないほとんどすべてが、そのルールに強く結びついています。 – TomServo

私は文法を調べてかなり単純化しました。私はすべてのpython-esqe空白処理を取り出し、空白を空白として扱いました。この文法は、あなたが質問で言ったように、この入力を解析して、1行、1重引用符、2重引用符などを扱います。

setup(
    setup_requires=['pytest-runner'], 
    tests_require=['pytest', 
    'unittest2', 
    "test_3" ], 
)

そして、ここではかなり簡略化された文法です：

grammar SetupPy ; 
setupDeclaration : 'setup' '(' method ')' EOF; 
method : setupRequires testRequires ; 
setupRequires : 'setup_requires' '=' '[' LISTVAL* (',' LISTVAL)* ']' ',' ; 
testRequires : 'tests_require' '=' '[' LISTVAL* (',' LISTVAL)* ']' ',' ; 
WS: [ \t\n\r]+ -> skip ; 
LISTVAL : SHORT_STRING ; 
fragment SHORT_STRING 
: '\'' (STRING_ESCAPE_SEQ | ~[\\\r\n\f'])* '\'' 
| '"' (STRING_ESCAPE_SEQ | ~[\\\r\n\f"])* '"' 
; 
fragment STRING_ESCAPE_SEQ 
: '\\' . 
| '\\' 
;

ああ、ここでトークンの正しい割り当てを示すパーサレクサーの出力です：今、あなたの

[@0,0:4='setup',<'setup'>,1:0] 
[@1,5:5='(',<'('>,1:5] 
[@2,12:25='setup_requires',<'setup_requires'>,2:4] 
[@3,26:26='=',<'='>,2:18] 
[@4,27:27='[',<'['>,2:19] 
[@5,28:42=''pytest-runner'',<LISTVAL>,2:20] 
[@6,43:43=']',<']'>,2:35] 
[@7,44:44=',',<','>,2:36] 
[@8,51:63='tests_require',<'tests_require'>,3:4] 
[@9,64:64='=',<'='>,3:17] 
[@10,65:65='[',<'['>,3:18] 
[@11,66:73=''pytest'',<LISTVAL>,3:19] 
[@12,74:74=',',<','>,3:27] 
[@13,79:89=''unittest2'',<LISTVAL>,4:1] 
[@14,90:90=',',<','>,4:12] 
[@15,95:102='"test_3"',<LISTVAL>,5:1] 
[@16,104:104=']',<']'>,5:10] 
[@17,105:105=',',<','>,5:11] 
[@18,108:108=')',<')'>,6:0] 
[@19,109:108='<EOF>',<EOF>,6:1]

を単純なANTLR訪問者またはリスナーのパターンに従うことができるはずですあなたのLISTVALトークンと一緒にあなたのことをしてください。これがあなたのニーズを満たすことを願っています。確かにテスト入力を解析します。

出典

2017-07-16 21:39:43 TomServo

そしておそらくこれについてのupvoteもありますか？ありがとう、私たちは両方とも、これらの遅いタグでハード担当者が来る方法を知っています。 :) – TomServo

Pythonのセットアップファイルを解析するためのAntlr

答えて

関連する問題