美しいスープを使ってテキストデータを選択する

私はpython美しいスープを使って下のhtmlからテキストデータを選択しようとしていますが、問題があります。基本的には<b>にタイトルがありますが、私はそれ以外のデータが必要です。たとえば、最初はアセスメントタイプですが、私はキャパシティカーブだけが必要です。美しいスープを使ってテキストデータを選択する

modelinginfo = soup.find("div", {"id":"genInfo"}) # this is my raw data 
rows=modelinginfo.findChildren(['p']) # this is the data displayed below 
for row in rows: 
    print(row) 
    print('/n') 
    cells = row.findChildren('p') 
    for cell in cells: 
     value = cell.string 
     print("The value in this cell is %s" % value) 


[<p><b>Assessment Type: </b>Capacity curve</p>, 
<p><b>Name: </b>Borzi et al (2008) - Capacity-Xdir 4Storeys InfilledFrame NonSismicallyDesigned</p>, 
<p><b>Category: </b>Structure specific - Building</p>, 
<p><b>Taxonomy: </b>CR/LFINF+DNO/HEX:4 (GEM)</p>, 
<p><b>Reference: </b>The influence of infill panels on vulnerability curves for RC buildings (Borzi B., Crowley H., Pinho R., 2008) - Proceedings of the 14th World Conference on Earthquake Engineering, Beijing, China</p>, 
<p><b>Web Link: </b><a href="http://www.iitk.ac.in/nicee/wcee/article/14_09-01-0111.PDF" style="color:blue" target="_blank"> http://www.iitk.ac.in/nicee/wcee/article/14_09-01-0111.PDF</a></p>, 
<p><b>Methodology: </b>Analytical</p>, 
<p><b>General Comments: </b>Sample Data: A 4-storey building designed according to the 1992 Italian design code (DM, 1992), considering gravity loads only, and the Decreto Ministeriale 1996 (DM, 1996) when considering seismic action (the seismically designed building has been designed assuming a lateral force equal to 10% of the seismic weight, c=10%, and with a triangular distribution shape). 

The Y axis in the capacity curve represent the collapse multiplier: Base shear resistance over seismic weight.</p>, 
<p><b>Geographical Applicability: </b> Italy</p>]

出典

2016-05-12 Corncobpipe

あなたはサイトへのリンクを追加できますか？ –

多分、条件文を使って文字列全体を取り出して、不要なものを分割して削除することができます。 –

このサイトはパスワードで保護されているため、リンクが役立たないでしょう。 Dot_Pyあなたはそれを行う方法を説明できますか？私はPythonに慣れているので、どのように理解するのは少し難しいですか？ – Corncobpipe

1）あなたはpchildrenを反復処理すると、すべてを印刷し、bタグの以外することができます：

for cell in cells: 
    for element in cell.children: 
     if element.name != 'b': 
      print("The value in this cell is %s" % element)

2）あなたがきれいにextract()メソッドを使用することができますここに私はこれまで持っているものですあなたのために不要になったbタグ：

for cell in cells: 
    if cell.b: 
     # remove "b" tag 
     cell.b.extract() 
    print("The value in this cell is %s" % cell)

出典

2016-05-12 20:44:17 arma

1）チャームのように働いた！ありがとう！ – Corncobpipe

問題のより簡単な解決策https://stackoverflow.com/a/4995283/4854931 – Alex78191

美しいスープを使ってテキストデータを選択する

答えて

関連する問題