私はPythonの初心者です。この機能していないコードを使って、テキストファイルから2つのヘッダー間の情報を抽出しようとしています。参考のため2つのヘッダーの間の線で情報を抽出するにはどうすればよいですか？

with open('toysystem.txt','r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    i = 0 
    lines = f.readlines() 
    for line in lines: 
    if line == start: 
    keywords = lines[i+1] 
i += 1

、テキストファイルは次のようになります。コードで間違っているかもしれないものに

<Keywords> 
GTO 
</Keywords>

任意のアイデア？または、おそらくこの問題に近づく別の方法ですか？

ありがとうございました！

出典

2017-06-03 pennypeat

行がファイルから読み込まれ、最後に改行記号が含まれているので、我々はおそらくそれらをstripする必要があり、
fオブジェクトがiteratorあるので、ここでstr.readlinesメソッドを使用する必要はありません。

だから我々はあなたにも、キーワードの末尾に改行を必要としない場合

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    for line in f: 
     if line.rstrip() == start: 
      break 
    for line in f: 
     if line.rstrip() == end: 
      break 
     keywords.append(line)

のようなものは、私たちに

>>> keywords 
['GTO\n']

を与えて書くことができます - あまりにもそれらを取り除きます：

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    for line in f: 
     if line.rstrip() == start: 
      break 
    for line in f: 
     if line.rstrip() == end: 
      break 
     keywords.append(line.rstrip())

>>> keywords 
['GTO']

を与えます。しかし、この場合には

同じことを

with open('toysystem.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    stripped_lines = (line.rstrip() for line in f) 
    for line in stripped_lines: 
     if line == start: 
      break 
    for line in stripped_lines: 
     if line == end: 
      break 
     keywords.append(line)

のような帯状ラインgeneratorを作成する方がよいでしょう。

最後に、スクリプトの次の部分に自分のラインを必要とする場合、我々はstr.readlinesを使用することができますし、ストリップラインは、発電機：

with open('test.txt', 'r') as f: 
    start = '<Keywords>' 
    end = '</Keywords>' 
    keywords = [] 
    lines = f.readlines() 
    stripped_lines = (line.rstrip() for line in lines) 
    for line in stripped_lines: 
     if line.rstrip() == start: 
      break 
    for line in stripped_lines: 
     if line.rstrip() == end: 
      break 
     keywords.append(line.rstrip())

はさらにを読んで、私たちに

>>> lines 
['<Keywords>\n', 'GTO\n', '</Keywords>\n'] 
>>> keywords 
['GTO']

を与えます

file objects、（ファイルイテレータ含む）

iterators、

list comprehension、

generator expression

出典

2017-06-03 05:06:01

使用Pythonはモジュールを再instedと正規表現を使用してそれを解析します！

import re 
with open('toysystem.txt','r') as f: 
    contents = f.read() 
    # will find all the expressions in the file and return a list of values inside the(). You can extend the expression according to your need. 
    keywords = re.findall(r'\<keywords\>\s*\n*\s*(.*?)\s*\n*\s*\<\/keywords\>') 
    print(keywords)

ファイルからそれが正規表現とのpythonチェックTutorialspoint 、For python3とPython2

についての詳細は

['GTO']

を印刷します

出典

2017-06-03 17:55:11

2つのヘッダーの間の線で情報を抽出するにはどうすればよいですか？

答えて

を与えます file objects、（ファイルイテレータ含む） iterators、 list comprehension、 generator expression

関連する問題

を与えます

file objects、（ファイルイテレータ含む）

iterators、

list comprehension、

generator expression