2017-06-22 8 views
1

私はBeautifulsoupとsoup.findAllを使用して関連情報にアクセスしましたが、私は<TR>タグに1つの値(<TR>...</TR>の間)を削除します。 どうすればいいですか? Pythonの2.7ウェブスクレイピングデータから要素を削除するにはどうすればよいですか?

. 
. 
. 

soup = BeautifulSoup(x, 'lxml') 

tab6col = soup.findAll('table', { "class" : "tab6col" }) 

ここに私のhtmlコード:

[<table border="0" class="tab6col" id="pm">\n<tr><td>\xa0</td><td align="right" class="contentword"><b>2015. \xe9v</b></td><td align="right" class="contentword"><b>2014. \xe9v</b></td><td align="right" class="contentword"><b>2013. \xe9v</b></td><td align="right" class="contentword"><b>2012. \xe9v</b></td><td align="right" class="contentword"><b>2011. \xe9v</b></td></tr><tr><td class="contentword"><b>Besz\xe1mol\xe1si id\xf5szak</b></td><td align="right" class="contentword"><span class="pm_idoszak">2015.01.01. - 2015.12.31.</span></td><td align="right" class="contentword"><span class="pm_idoszak">2014.01.01. - 2014.12.31.</span></td><td align="right" class="contentword"><span class="pm_idoszak">2013.12.30. - 2013.12.31.</span></td><td align="right" class="contentword"><span class="pm_idoszak">Nincs adat.</span></td><td align="right" class="contentword"><span class="pm_idoszak">Nincs adat.</span></td></tr><tr><td>\xa0</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td></tr><tr><td class="contentword">\xc9rt\xe9kes\xedt\xe9s nett\xf3 \xe1rbev\xe9tele</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Bev\xe9telek</td><td align="right" class="numberc">2 873 821</td><td align="right" class="numberc">3 162 742</td><td align="right" class="numberc">9 194</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">\xdczemi eredm\xe9ny</td><td align="right" class="numberc">81 937</td><td align="right" class="numberc">-181 850</td><td align="right" class="numberc">1 755</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Ad\xf3z\xe1s el\xf5tti eredm\xe9ny</td><td align="right" class="numberc">-192 778</td><td align="right" class="numberc">-169 476</td><td align="right" class="numberc">1 755</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">M\xe9rleg szerinti eredm\xe9ny</td><td align="right" class="numberc">-124 099</td><td align="right" class="numberc">0</td><td align="right" class="numberc">1 421</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Ad\xf3zott eredm\xe9ny</td><td align="right" class="numberc">-192 778</td><td align="right" class="numberc">-169 476</td><td align="right" class="numberc">1 579</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Eszk\xf6z\xf6k \xf6sszesen</td><td align="right" class="numberc">37 820 881</td><td align="right" class="numberc">40 695 842</td><td align="right" class="numberc">36 992 091</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Befektetett eszk\xf6z\xf6k</td><td align="right" class="numberc">18 668 826</td><td align="right" class="numberc">18 525 063</td><td align="right" class="numberc">16 925 711</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Forg\xf3eszk\xf6z\xf6k</td><td align="right" class="numberc">19 008 587</td><td align="right" class="numberc">21 877 275</td><td align="right" class="numberc">19 792 420</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">P\xe9nzeszk\xf6z\xf6k</td><td align="right" class="numberc">947 015</td><td align="right" class="numberc">1 056 101</td><td align="right" class="numberc">1 307 515</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Akt\xedv id\xf5beli elhat\xe1rol\xe1sok</td><td align="right" class="numberc">143 468</td><td align="right" class="numberc">293 504</td><td align="right" class="numberc">273 960</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Saj\xe1t t\xf5ke</td><td align="right" class="numberc">2 141 319</td><td align="right" class="numberc">2 184 079</td><td align="right" class="numberc">2 353 554</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">C\xe9ltartal\xe9kok</td><td align="right" class="numberc">29 656</td><td align="right" class="numberc">148 652</td><td align="right" class="numberc">18 960</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">K\xf6telezetts\xe9gek</td><td align="right" class="numberc">35 541 531</td><td align="right" class="numberc">38 059 399</td><td align="right" class="numberc">34 233 518</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">R\xf6vid lej\xe1rat\xfa k\xf6telezetts\xe9gek</td><td align="right" class="numberc">30 519 491</td><td align="right" class="numberc">30 426 014</td><td align="right" class="numberc">26 394 088</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Hossz\xfa lej\xe1rat\xfa k\xf6telezetts\xe9gek</td><td align="right" class="numberc">5 022 040</td><td align="right" class="numberc">7 633 385</td><td align="right" class="numberc">7 839 430</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Passz\xedv id\xf5beli elhat\xe1rol\xe1sok</td><td align="right" class="numberc">108 375</td><td align="right" class="numberc">303 712</td><td align="right" class="numberc">386 059</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword" colspan="6"><b>P\xe9nz\xfcgyi mutat\xf3k</b></td></tr><tr><td class="contentword">Elad\xf3sodotts\xe1g foka <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Elad\xf3sodotts\xe1g foka&lt;/span&gt; (K\xf6telezetts\xe9gek/Eszk\xf6z\xf6k \xf6sszesen)&lt;br&gt;&lt;i&gt;Megmutatja, hogy az eszk\xf6z \xe1llom\xe1ny milyen m\xe9rt\xe9kben van megterhelve k\xf6telezetts\xe9gv\xe1llal\xe1ssal. Min\xe9l kisebb a mutat\xf3 \xe9rt\xe9ke, ann\xe1l jobb a c\xe9g meg\xedt\xe9l\xe9se.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Elad\xf3sodotts\xe1g m\xe9rt\xe9ke - Bonit\xe1s <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Elad\xf3sodotts\xe1g m\xe9rt\xe9ke - Bonit\xe1s&lt;/span&gt; (K\xf6telezetts\xe9gek/Saj\xe1t t\xf5ke)&lt;br&gt;&lt;i&gt;Azt mutatja, hogy a saj\xe1t forr\xe1sok a k\xf6telezetts\xe9gek h\xe1ny sz\xe1zal\xe9k\xe1t fedezik. Pozit\xedv a c\xe9g meg\xedt\xe9l\xe9se, ha a mutat\xf3 \xe9rt\xe9ke tart\xf3san (j\xf3val) 1 alatt van.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">\xc1rbev\xe9tel ar\xe1nyos eredm\xe9ny % <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;\xc1rbev\xe9tel ar\xe1nyos eredm\xe9ny %&lt;/span&gt; (Ad\xf3zott eredm\xe9ny/ Nett\xf3 \xe1rbev\xe9tel)\xd7100&lt;br&gt;&lt;i&gt;A mutat\xf3 az \xe1rbev\xe9tel hat\xe9konys\xe1g\xe1t fejezi ki \xfagy, hogy az \xe1rbev\xe9tel nyeres\xe9gtartalm\xe1t sz\xe1zal\xe9kban szeml\xe9lteti. A c\xe9g meg\xedt\xe9l\xe9se ann\xe1l pozit\xedvabb, min\xe9l magasabb a sz\xe1zal\xe9k.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Likvidit\xe1si gyorsr\xe1ta <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Likvidit\xe1si gyorsr\xe1ta&lt;/span&gt; ((Forg\xf3eszk\xf6z\xf6k-K\xe9szletek)/R\xf6vid lej.k\xf6telezetts\xe9gek)&lt;br&gt;&lt;i&gt;Azt fejezi ki, hogy az egy \xe9v alatt p\xe9nzz\xe9 tehet\xf5 k\xe9szletek n\xe9lk\xfcli forg\xf3eszk\xf6z\xf6k milyen ar\xe1nyban k\xe9pesek az egy \xe9ven bel\xfcl esed\xe9kes k\xf6telezetts\xe9gek fedez\xe9s\xe9re, azaz milyen a c\xe9g r\xf6vid t\xe1v\xfa fizet\xf5k\xe9pess\xe9ge.&lt;br&gt;A c\xe9g meg\xedt\xe9l\xe9se akkor pozit\xedv, ha ez az ar\xe1ny egyre n\xf6vekv\xf5, ami az azonnali fizet\xf5k\xe9pess\xe9g javul\xe1s\xe1t jelzi.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Saj\xe1t t\xf5ke ar\xe1nya <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Saj\xe1t t\xf5ke ar\xe1nya &lt;/span&gt; (Saj\xe1t t\xf5ke/Forr\xe1sok)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">0,06</td><td align="right" class="numberc">0,05</td><td align="right" class="numberc">0,06</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">Eszk\xf6zar\xe1nyos nyeres\xe9g <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Eszk\xf6zar\xe1nyos nyeres\xe9g &lt;/span&gt; (Ad\xf3zott eredm\xe9ny/Eszk\xf6z\xf6k)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">-0,01</td><td align="right" class="numberc">0,00</td><td align="right" class="numberc">0,00</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">Bev\xe9telar\xe1nyos eredm\xe9ny <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Bev\xe9telar\xe1nyos eredm\xe9ny &lt;/span&gt; (Ad\xf3zott eredm\xe9ny/Bev\xe9telek)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">-0,07</td><td align="right" class="numberc">-0,05</td><td align="right" class="numberc">0,17</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">Saj\xe1t t\xf5ke ar\xe1nyos nyeres\xe9g <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Saj\xe1t t\xf5ke ar\xe1nyos nyeres\xe9g &lt;/span&gt; (Ad\xf3zott eredm\xe9ny/Saj\xe1t t\xf5ke)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">-0,09</td><td align="right" class="numberc">-0,08</td><td align="right" class="numberc">0,00</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword" colspan="6"><b>L\xe9tsz\xe1m:</b> \xa0 136 f\xf5</td>\n</tr></table>]

と私は、この表には、この値を削除したい:

<tr><td class="contentword" colspan="6"><b>P\xe9nz\xfcgyi mutat\xf3k</b></td></tr>

私の完全なコード:

import urllib2 
 
import unicodecsv as csv 
 
import os 
 
import sys 
 
import io 
 
import time 
 
import datetime 
 
import pandas as pd 
 
from bs4 import BeautifulSoup 
 
import MySQLdb 
 

 
def to_2d(l,n): 
 
    return [l[i:i+n] for i in range(0, len(l), n)] 
 

 
filename=r'output.csv' 
 

 
resultcsv=open(filename,"wb") 
 
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1') 
 

 
f = open('opten2.txt', 'r') 
 
x = f.read() 
 

 
soup = BeautifulSoup(x, 'lxml') 
 

 
tab6col = soup.find('table', { "class" : "tab6col" }) 
 

 

 
datatable=[] 
 
for record in tab6col.findAll('tr'): 
 
    for data in record.findAll('td'): 
 
     datatable.append(data.text.encode('latin-1')) 
 

 
td = datatable.find("td", text="P\xe9nz\xfcgyi mutat\xf3k") 
 
td.decompose() 
 

 

 
maindatatable = to_2d(datatable, 6) 
 
print maindatatable 
 
output.writerows(maindatatable) 
 

 
resultcsv.close()

+1

申し訳ありませんが、正確に何を削除しますか?テーブル? – obskyr

+0

サンプルサンプルを表示するか、入力と希望の出力を言う –

+0

bsoupを使用してテーブルを取得しましたが、このTRの間の値を削除します。 私は自分の質問を更新し、理解しやすい完全なHTMLコードを挿入します。 – tardos93

答えて

1

必要なものはdecompose()です。 tdタグを見つけてdeompose()タグを削除してください。

soup = BeautifulSoup(x, "lxml") 
tab6col = soup.find("table", { "class" : "tab6col" }) 
td = tab6col.find("tr", text="P\xe9nz\xfcgyi mutat\xf3k") 
td.decompose() 

EDIT

これを試してみてください。

import urllib2 
import unicodecsv as csv 
import os 
import sys 
import io 
import time 
import datetime 
import pandas as pd 
from bs4 import BeautifulSoup 
import MySQLdb 

filename=r'output.csv' 

resultcsv=open(filename,"wb") 
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1') 

f = open('opten2.txt', 'r') 
x = f.read() 
f.close() 

soup = BeautifulSoup(x, 'lxml') 
tab6col = soup.find('table', { "class" : "tab6col" }) 

datatable=[] 
for record in tab6col.find_all('tr'): 
    temp_data = [] 
    for data in record.find_all('td'): 
     temp_data.append(data.text.encode('latin-1')) 
    datatable.append(temp_data) 

output.writerows(datatable) 

resultcsv.close() 
+0

だから私はそれを使用する必要がありますか? 'TD = tab6col.find( "P \ xe9nz \ xfcgyi mutat \ xf3k" =偽整列) td.decompose() ' – tardos93

+0

@ tardos93いいえ。そうではありません。 align属性を持たない 'td'タグをすべて削除しますか? –

+0

もちろんです。私は彼のtdタグでこのデータだけを削除します。 – tardos93

関連する問題