得ることができませんテーブルヘッダ要素はPythonで

、私はこのようにして得られたhtmlテーブル要素を含む変数を持っています得ることができませんテーブルヘッダ要素はPythonで

ただし、私が印刷するとheaders変数：

headers are: ['\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n 
     ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n 
', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n 
     ', '\n  ', '\n  ']

ヘッダーを正しく抽出するにはどうすればよいですか？

出典

2016-04-26 octavian

は、[このコード]（HTTPSを使用して問題を再現することはできません。 github.com/har07/c693eac57c79c2896881f9b6e2de2202）。問題を再現するために簡素化された完全なコードを投稿できますか？ – har07

これを試してみてください：

from lxml import html 

HTML_CODE = """<table class="list"> 
     <tr> 
     <th>Date(s)</th> 
     <th>Sport</th> 
     <th>Event</th> 
     <th>Location</th> 
     </tr> 
     <tr> 
     <td>Jan 18-31</td> 
     <td>Tennis</td> 
     <td><a href="tennis-grand-slam/australian-open/index.htm">Australia Open</a></td> 
     <td>Melbourne, Australia</td> 
     </tr> 
</table>""" 

tree = html.fromstring(HTML_CODE) 
headers = tree.xpath('//table[@class="list"]/tr/th/text()') 
print "Headers are: {}".format(', '.join(headers))

出力：

Headers are: Date(s), Sport, Event, Location

出典

2016-04-26 13:14:42

がテーブルを使用し、前提が唯一のものです：あなただけしたい場合

table[0].xpath("//th/text()")

かテーブルのヘッダーとあなただけの必要な何かのためにそれを使用する予定はない：

の両方があなたを与えるだろう：//要旨：

['Date(s)', 'Sport', 'Event', 'Location']

出典

2016-04-26 14:33:13

得ることができませんテーブルヘッダ要素はPythonで

答えて

関連する問題