このテキストファイルを処理して必要なものを解析するにはどうすればよいですか？

私はPythonのdoctestモジュールからouputを解析し、それをHTMLファイルに保存しようとしています。このテキストファイルを処理して必要なものを解析するにはどうすればよいですか？

私は、このような出力を持っている：

********************************************************************** 
File "example.py", line 16, in __main__.factorial 
Failed example: 
    [factorial(n) for n in range(6)] 
Expected: 
    [0, 1, 2, 6, 24, 120] 
Got: 
    [1, 1, 2, 6, 24, 120] 
********************************************************************** 
File "example.py", line 20, in __main__.factorial 
Failed example: 
    factorial(30) 
Expected: 
    25252859812191058636308480000000L 
Got: 
    265252859812191058636308480000000L 
********************************************************************** 
1 items had failures: 
    2 of 8 in __main__.factorial 
***Test Failed*** 2 failures.

各故障が互いから各テストの失敗を区切るアスタリスクの行が先行しています。

期待した結果と実際の結果だけでなく、失敗したファイル名と方法を削除します。次に、これを使ってHTML文書を作成したい（またはテキストファイルに格納してから、2回目の解析を行います）。

これを行うにはどうすればいいですか？単なるPythonまたはUNIXシェルユーティリティの組み合わせを使用してください。

EDIT：私はどのように各ブロックにマッチする以下のシェルスクリプトを作成しましたが、各sedのマッチを自分のファイルにリダイレクトする方法は不明です。

python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`

出典

2009-08-07 samoz

ファイル、メソッド、期待値と実際の結果を削除すると、何が残っていますか？ – juanjux

これまでは、個々のフィールドではなく、一度にブロック全体を取り込むことしかできないため、別々のチャンクに解析するのに問題がありました。 – samoz

これは、関連する情報を持つタプルへの出力を解析し、迅速かつ汚いスクリプトです：

import sys 
import re 

stars_re = re.compile('^[*]+$', re.MULTILINE) 
file_line_re = re.compile(r'^File "(.*?)", line (\d*), in (.*)$') 

doctest_output = sys.stdin.read() 
chunks = stars_re.split(doctest_output)[1:-1] 

for chunk in chunks: 
    chunk_lines = chunk.strip().splitlines() 
    m = file_line_re.match(chunk_lines[0]) 

    file, line, module = m.groups() 
    failed_example = chunk_lines[2].strip() 
    expected = chunk_lines[4].strip() 
     got = chunk_lines[6].strip() 

    print (file, line, module, failed_example, expected, got)

出典

2009-08-07 21:11:11

あなたは離れてこれを選択するPythonプログラムを書くことができますが、多分行うには良い事は出力に、あなたが最初の場所でレポートをdoctestの変更を検討するだろう。 doctest.DocTestRunnerのためのドキュメントから：

        ... the display output 
can be also customized by subclassing DocTestRunner, and 
overriding the methods `report_start`, `report_success`, 
`report_unexpected_exception`, and `report_failure`.

出典

2009-08-07 21:10:35

私は間違いなくこれを見てみましょう！ – samoz

私はそれを行うにはpyparsingで迅速なパーサを書きました。

from pyparsing import * 

str = """ 
********************************************************************** 
File "example.py", line 16, in __main__.factorial 
Failed example: 
    [factorial(n) for n in range(6)] 
Expected: 
    [0, 1, 2, 6, 24, 120] 
Got: 
    [1, 1, 2, 6, 24, 120] 
********************************************************************** 
File "example.py", line 20, in __main__.factorial 
Failed example: 
    factorial(30) 
Expected: 
    25252859812191058636308480000000L 
Got: 
    265252859812191058636308480000000L 
********************************************************************** 
""" 

quote = Literal('"').suppress() 
comma = Literal(',').suppress() 
in_ = Keyword('in').suppress() 
block = OneOrMore("**").suppress() + \ 
     Keyword("File").suppress() + \ 
     quote + Word(alphanums + ".") + quote + \ 
     comma + Keyword("line").suppress() + Word(nums) + comma + \ 
     in_ + Word(alphanums + "._") + \ 
     LineStart() + restOfLine.suppress() + \ 
     LineStart() + restOfLine + \ 
     LineStart() + restOfLine.suppress() + \ 
     LineStart() + restOfLine + \ 
     LineStart() + restOfLine.suppress() + \ 
     LineStart() + restOfLine 

all = OneOrMore(Group(block)) 

result = all.parseString(str) 

for section in result: 
    print section

は

['example.py', '16', '__main__.factorial', ' [factorial(n) for n in range(6)]', ' [0, 1, 2, 6, 24, 120]', ' [1, 1, 2, 6, 24, 120]'] 
['example.py', '20', '__main__.factorial', ' factorial(30)', ' 25252859812191058636308480000000L', ' 265252859812191058636308480000000L']

出典

2009-08-07 21:43:23

非常に良い仕事！ – samoz

なぜ文字列の前後に3文字のマークがあるのですか？申し訳ありませんが、私のPythonは実際にはそれほどではありません – samoz

トリプルクォートは複数の文字列ライン。 –

これはおそらく私が今まで書いてきた以上、エレガントなPythonスクリプトの一つであるが、それはUNIXユーティリティと別のスクリプトに頼ることなく、あなたが欲しいものを行うためのフレームワークを持っている必要がありますを提供しますhtmlを作成します。テストされていませんが、動作させるためにマイナーな調整が必要です。

import os 
import sys 

#create a list of all files in directory 
dirList = os.listdir('') 

#Ignore anything that isn't a .txt file. 
# 
#Read in text, then split it into a list. 
for thisFile in dirList: 
    if thisFile.endswith(".txt"): 
     infile = open(thisFile,'r') 

     rawText = infile.read() 

     yourList = rawText.split('\n') 

     #Strings 
     compiledText = '' 
     htmlText = '' 

     for i in yourList: 

      #clunky way of seeing whether or not current line 
      #should be included in compiledText 

      if i.startswith("*****"): 
       compiledText += "\n\n--- New Report ---\n" 

      if i.startswith("File"): 
       compiledText += i + '\n' 

      if i.startswith("Fail"): 
       compiledText += i + '\n' 

      if i.startswith("Expe"): 
       compiledText += i + '\n' 

      if i.startswith("Got"): 
       compiledText += i + '\n' 

      if i.startswith(" "): 
       compiledText += i + '\n' 


    #insert your HTML template below 

    htmlText = '<html>...\n <body> \n '+htmlText+'</body>... </html>' 


    #write out to file 
    outfile = open('processed/'+thisFile+'.html','w') 
    outfile.write(htmlText) 
    outfile.close()

出典

2009-08-07 22:28:11 Sean

このテキストファイルを処理して必要なものを解析するにはどうすればよいですか？

答えて

関連する問題