どうすればこれらのセルをhtmlコードから読み込み、python web-scrapingで読むことができますか？

私はこのウェブサイトから、データベースにそれを取る後に交換価格の情報をこするしたい：私はコードを書いたThatsなぜどうすればこれらのセルをhtmlコードから読み込み、python web-scrapingで読むことができますか？

<tbody> 
    <tr> 
     <td class="valute"><b>CHF</b></td> 
     <td class="valutename">svájci frank</td> 
     <td class="unit">1</td> 
     <td class="value">284,38</td> 
    </tr> 
    <tr> 
     <td class="valute"><b>EUR</b></td> 
     <td class="valutename">euro</td> 
     <td class="unit">1</td> 
     <td class="value">308,54</td> 
    </tr> 
    <tr> 
     <td class="valute"><b>USD</b></td> 
     <td class="valutename">USA dollár</td> 
     <td class="unit">1</td> 
     <td class="value">273,94</td> 
    </tr> 
</tbody>

が、と間違って何か：私は、HTMLのこの部分を必要とするhttps://www.mnb.hu/arfolyamok

それ。どのように私はそれを修正することができます、私はそれを変更する必要がありますか？私は、 "valute"、 "valutename"、 "unit"と "value"のデータだけが必要です。エラーメッセージは次である

のWindows 7上で私は、Python 2.7.13で働いています： は、「エラーがあなたのプログラムにあります：インデント解除は、任意の外側のインデントレベルと一致していない」

をコードはこちら：

import csv 
import requests 
from BeautifulSoup import BeautifulSoup 

url = 'https://www.mnb.hu/arfolyamok' 
response = requests.get(url) 
html = response.content 

soup = BeautifulSoup(html) 
table = soup.find('tbody', attrs={'class': 'stripe'}) 

table = str(soup) 
table = table.split("<tbody>") 

list_of_rows = [] 
for row in table[1].findAll('tr')[1:]: 
    list_of_cells = [] 
    for cell in row.findAll('td'): 
     text = cell.text.replace('&nbsp;', '') 
     list_of_cells.append(text) 
    list_of_rows.append(list_of_cells) 

print list_of_rows 

outfile = open("./inmates.csv", "wb") 
writer = csv.writer(outfile) 
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"]) 
writer.writerows(list_of_rows)

出典

2017-06-08 tardos93

さて、あなたは明らかにfor'ループ '周りのいくつかのインデントの問題を抱えています。あなたは同じ量のスペースを使う必要があります。 – 098799

Pythonの字下げは、4つのスペースの倍数である必要があります。これを手動で修正するか、（好ましくは）[autopep8]（https://stackoverflow.com/questions/14328406/tool-to-convert-python-code-to-be-pep8）のようなPythonのコードフォーマッタを使用することをお勧めします準拠）。 – Hat

@ハット実際には、あなたが望むたくさんのスペースがあるかもしれませんが、十分ですが、一貫して適用する必要があります。 – 098799

あなたは20 list_of_cells.append(text)をライン間18 for cell in row.findAll('td'):から、あなたのコード内のspace問題を抱えています。ここに固定コードがあります：

import csv 
import requests 
from bs4 import BeautifulSoup 

url = 'https://www.mnb.hu/arfolyamok' 
response = requests.get(url) 
html = response.content 

soup = BeautifulSoup(html) 
table = soup.find('tbody', attrs={'class': 'stripe'}) 

table = str(soup) 
table = table.split("<tbody>") 

list_of_rows = [] 
for row in table[1].findAll('tr')[1:]: 
    list_of_cells = [] 
    for cell in row.findAll('td'): 
     text = cell.text.replace('&nbsp;', '') 
     list_of_cells.append(text) 
    list_of_rows.append(list_of_cells) 

print list_of_rows 

outfile = open("./inmates.csv", "wb") 
writer = csv.writer(outfile) 
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"]) 
writer.writerows(list_of_rows)

しかし、このコードを実行した後には、文字エンコードのエラーが発生します。「SyntaxError: Non-ASCII character '\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details」

これを修正するにはどうすればよいですか？シンプルで十分です...シバン# -*- coding: utf-8 -*-をコードの先頭に追加してください（1行目）。それはそれを修正する必要があります。

編集：ちょうど間違った方法でBeautifulSoupを使用していて、間違ってインポートしていることに気付きました。私はfrom bs4 import BeautifulSoupにインポートを修正しました。また、BeautifulSoupを使用するときは、パーサーも指定する必要があります。だから、

soup = BeautifulSoup(html)

はなる：

soup = BeautifulSoup(html, "html.parser")

出典

2017-07-13 13:33:24 Xonshiz

どうすればこれらのセルをhtmlコードから読み込み、python web-scrapingで読むことができますか？

答えて

関連する問題