取り外し要素とPython 3

私は、Webからのデータをこすると、タグ「のdiv」とクラスを持つすべての要素を削除しようとしているが、以下に、このHTMLのような「モジュールをノート」：取り外し要素とPython 3

 <div class="notes module" role="complementary"> 
    <h3 class="heading">Notes:</h3> 
    <ul class="associations"> 
     <li> 
      Translation into Русский available: 
      <a href="/works/494195">Два-два-один Браво Бейкер</a> by <a rel="author" href="https://stackoverflow.com/users/dzenka/pseuds/dzenka">dzenka</a>, <a rel="author" href="https://stackoverflow.com/users/La_Ardilla/pseuds/La_Ardilla">La_Ardilla</a> 
     </li> 
    </ul> 
    <blockquote class="userstuff"> 
     <p> 
    <i>Warnings: numerous references to and glancing depictions of combat, injury, murder, and mutilation of the dead; deaths of minor and major original characters. Numerous explicit depictions of sex between two men.</i> 
</p> 
    </blockquote> 
    <p class="jump">(See the end of the work for <a href="#children">other works inspired by this one</a>.)</p> 
</div>

ソースはこちらです：view-source：http://archiveofourown.org/works/180121?view_full_work=true

私は削除したい要素を見つけて印刷することにも苦労しています。これまでのところ私は持っています：

import urllib.request, urllib.parse, urllib.error 
from lxml import html 
from bs4 import BeautifulSoup 

url = 'http://archiveofourown.org/works/180121?view_full_work=true' 
html = urllib.request.urlopen(url).read() 
soup = BeautifulSoup(html, 'lxml') 
removals = soup.find_all('div', {'id':'notes module'}) 
for match in removals: 
    match.decompose()

しかし、除去は空のリストを返します。上に示したdiv要素全体を選択して、htmlからそのような要素をすべて選択して削除することができますか？

ありがとうございます。

出典

2017-12-23 SBlack

あなたが見つけようとしているdivにはclass = "notes module"がありますが、あなたのコードではこれらのdivをid = "notes module"で見つけようとしています。これに

removals = soup.find_all('div', {'id':'notes module'})

：この行を変更し

removals = soup.find_all('div', {'class':'notes module'})

出典

2017-12-23 17:08:36

ありがとうございました。私はまだ空リストを取得しています。 – SBlack

はそれをやってみます。それはclass='wrapper'の下にそのウェブページから利用可能なすべてのdivsを追い出すでしょう。

import requests 
from bs4 import BeautifulSoup 

html = requests.get('http://archiveofourown.org/works/180121?view_full_work=true') 
soup = BeautifulSoup(html.text, 'lxml') 
for item in soup.select(".wrapper"): 
    [elem.extract() for elem in item("div")] 
    print(item)

出典

2017-12-23 20:15:19 SIM

答えて

関連する問題