使用BeautifulSoupは、私がこのようなHTML構造を持つヘッダ

-1

で行をフェッチします。使用BeautifulSoupは、私がこのようなHTML構造を持つヘッダ

BlendInfo = namedtuple('BlendInfo', ['brand', 'type', 'contents', 'flavoring']) 
stats_rows = soup.find('table', id='stats').find_all('tr') 
bi = BlendInfo(brand  = stats_rows[1].td.get_text(), 
       type  = stats_rows[2].td.get_text(), 
       contents = stats_rows[3].td.get_text(), 
       flavoring = stats_rows[4].td.get_text())

しかし、それはインデックスうち境界（または本当にめちゃくちゃ取得）に失敗した予想通り、テーブルの順序が異なる場合（ブランドの前のタイプ）、または一部：この私はこのようなコードをしたスクラップする

行のうち欠落しています（内容なし）。

のようなものに任意のより良い方法があります：

これはあなたのためのdictを建設する「ブランド」の文字列で私にヘッダと行からのデータを与える

出典

2017-11-16 Lin

間違いなく可能です。これをチェックアウト：

from bs4 import BeautifulSoup 

html_content=''' 
<table class="info" id="stats"> 
<tbody> 
    <tr> 
    <th> Brand </th> 
    <td> 2 Guys Smoke Shop </td> 
    </tr> 
    <tr> 
    <th> Blend Type </th> 
    <td> Aromatic </td> 
    </tr> 
    <tr> 
    <th> Contents </th> 
    <td> Black Cavendish, Virginia </td> 
    </tr> 
    <tr> 
    <th> Flavoring </th> 
    <td> Other/Misc </td> 
    </tr> 
</tbody> 
</table> 
''' 
soup = BeautifulSoup(html_content,"lxml") 
for item in soup.find_all(class_='info')[0].find_all("th"): 
    header = item.text 
    rows = item.find_next_sibling().text 
    print(header,rows)

出力：

Brand 2 Guys Smoke Shop 
Blend Type Aromatic 
Contents Black Cavendish, Virginia 
Flavoring Other/Misc

を

出典

2017-11-16 13:31:43 SIM

：

from BeautifulSoup import BeautifulSoup 

valid_headers = ['brand', 'type', 'contents', 'flavoring'] 

t = """<table class="info" id="stats"> 
<tbody> 
    <tr> 
    <th> Brand </th> 
    <td> 2 Guys Smoke Shop </td> 
    </tr> 
    <tr> 
    <th> Blend Type </th> 
    <td> Aromatic </td> 
    </tr> 
    <tr> 
    <th> Contents </th> 
    <td> Black Cavendish, Virginia </td> 
    </tr> 
    <tr> 
    <th> Flavoring </th> 
    <td> Other/Misc </td> 
    </tr> 
</tbody> 
</table>""" 

bs = BeautifulSoup(t) 

results = {} 
for row in bs.findAll('tr'): 
    hea = row.findAll('th') 
    if hea.strip().lstrip().lower() in valid_headers: 
     val = row.findAll('td') 
     results[hea[0].string] = val[0].string 

print results

出典

2017-11-16 13:31:43 alexisdevarennes

使用BeautifulSoupは、私がこのようなHTML構造を持つヘッダ

答えて

関連する問題