Python - 複数のファイルから複数の文字列のテキストを抽出する

Pythonの教祖は、リストからすべてのテキストを抽出する必要があります。下のURLはパターンのサンプルです。私はまた、スクリプトがフォルダ内のすべてのファイルをループできるようにしたい。Python - 複数のファイルから複数の文字列のテキストを抽出する

..... 
..... 
<List>Product Line</List> 
<URL>http://teamspace.abb.com/sites/Product</URL> 
... 
... 
<List>Contact Number</List> 
<URL>https://teamspace.abb.com/sites/Contact</URL> 
.... 
....

の予想される出力

<List>Product Line</List> 
<URL>http://teamspace.abb.com/sites/Product</URL> 
<List>Contact Number</List> 
<URL>https://teamspace.abb.com/sites/Contact</URL>

私はリストから始まるすべてのキーワードを抽出し、ループにできたフォルダ内のすべてのファイルとスクリプトを開発してきましたが、URLを含むように私はできません。あなたの助けが大変ありがとうございます。

# defining location of parent folder 
    BASE_DIRECTORY = 'C:\D_Drive\Projects\Test' 
    output_file = open('C:\D_Drive\Projects\\Test\Output.txt', 'w') 
    output = {} 
    file_list = [] 

# scanning through sub folders 
for (dirpath, dirnames, filenames) in os.walk(BASE_DIRECTORY): 
for f in filenames: 
    if 'xml' in str(f): 
     e = os.path.join(str(dirpath), str(f)) 
     file_list.append(e) 

for f in file_list: 
print f 
txtfile = open(f, 'r') 
output[f] = [] 
for line in txtfile: 
    if '<List>' in line: 
     output[f].append(line) 
tabs = [] 
for tab in output: 
tabs.append(tab) 

tabs.sort() 
for tab in tabs: 
output_file.write(tab + '\n') 
output_file.write('\n') 
for row in output[tab]: 
    output_file.write(row + '') 
output_file.write('\n') 
output_file.write('----------------------------------------------------------\n') 

raw_input()

Sample file

出典

2017-06-17 user1902849

入力と期待される出力は同じように見える

tgt=('URL', 'List') with open('file') as f: print filter(lambda line: any(e in line for e in tgt), (line for line in f))

かを。あなたの質問を改善してみてください – fferri

なぜ車輪を改造するのですか？ [xml tree]（https://docs.python.org/2/library/xml.etree.elementtree.html）のようなxmlパーサを使用してください。 – dawg

インデントを更新してください。 –

あなたの答えは、ほとんどが右の唯一の変化は、イテレータを作成するために必要とされていますファイルのために。要素木や美しいスープを使うこともできますが、このような繰り返しを理解することは、xmlやhtml以外のファイルでも機能します。

txtfile = iter(open(f, 'r')) # change here 
output[f] = [] 
for line in txtfile: 
    if '<List>' in line: 
     output[f].append(line) 
     output[f].append(next(txtfile)) # and here

出典

2017-06-17 15:11:47

優秀！ありがとうございました – user1902849

xml.etree.ElementTreeで試してみてください：

import xml.etree.ElementTree as ET 
tree = ET.parse('Product_Workflow.xml') 
root = tree.getroot() 
with open('Output.txt','w') as opfile: 
    for l,u in zip(root.iter('List'),root.iter('URL')): 
     opfile.write(ET.tostring(l).strip()) 
     opfile.write('\n') 
     opfile.write(ET.tostring(u).strip()) 
     opfile.write('\n')

Output.txtはあなたを与える：

<List>Emove</List> 
<URL>http://teamspace.abb.com/sites/Product</URL> 
<List>Asset_KWT</List> 
<URL>https://teamspace.slb.com/sites/Contact</URL>

出典

2017-06-17 15:06:46

情報ありがとうございます。 XML要素メソッドを見ていきます。 – user1902849

あなたはそうのようなfilterやリストの内包表記を使用することができます：どちらか一方のプリント

with open('/tmp/file') as f: 
    print [line for line in f if any(e in line for e in tgt)]

：

[' <List>Product Line</List>\n', ' <URL>http://teamspace.abb.com/sites/Product</URL>\n', ' <List>Contact Number</List>\n', ' <URL>https://teamspace.abb.com/sites/Contact</URL>\n']

出典

2017-06-17 15:50:07 dawg

コメントありがとう、私はそれを見ます。 – user1902849

Python - 複数のファイルから複数の文字列のテキストを抽出する

答えて

関連する問題