Beautiful SoupでPythonのhtml解析でxmlデータを使用する理想的な方法は何ですか？

Beautiful SoupでPythonのhtml解析でxmlをテキストに変換する理想的な方法は何ですか？Beautiful SoupでPythonのhtml解析でxmlデータを使用する理想的な方法は何ですか？

私はPython 2.7 BeautifulSoupライブラリを使ってhtml解析を行っていますが、私は「スープ」へのステップに行くことができますが、必要なデータをどのように抽出するかわからないので、

次の例では、スパンタグ内のすべての数値を抽出して追加します。より良い方法がありますか？

XMLデータ： http://python-data.dr-chuck.net/comments_324255.html

CODE：

import urllib2 
from BeautifulSoup import * 
import re 

url = 'http://python-data.dr-chuck.net/comments_324255.html' 
html = urllib2.urlopen(url).read() 
soup = BeautifulSoup(html) 
spans = soup('span') 
lis = list() 
span_str = str(spans) 
sp = re.findall('([0-9]+)', span_str) 
count = 0 
for i in sp: 
    count = count + int(i) 
print('Sum:', count)

出典

2017-01-19 Ethan ZHOU

もっと読むBeautifulSoup doc - 多くの便利な機能があります。 – furas

正規表現を必要としないでください

：

from bs4 import BeautifulSoup 
from requests import get 

url = 'http://python-data.dr-chuck.net/comments_324255.html' 
html = get(url).text 
soup = BeautifulSoup(html, 'lxml') 

count = sum(int(n.text) for n in soup.findAll('span'))

出典

2017-01-19 14:15:48

ありがとうございます。これは私のコードよりも簡単です。ところで、4行目に何が起こったのですか？ html = get（url）.text と8行目：int（n.text） '.text'は内部メソッドですか？ –

'.text'でプレーンテキストを取得します。これはメソッドではなく、RequestおよびBeautifulSoupオブジェクトの変数です。彼らはちょうど同じように見える。 –

import requests, bs4 
r = requests.get("http://python-data.dr-chuck.net/comments_324255.html") 
soup = bs4.BeautifulSoup(r.text, 'lxml') 

sum(int(span.text) for span in soup.find_all(class_="comments"))

出力：

出典

2017-01-19 14:16:21

あなたもありがとう。さて、初心者にもう1つの質問がありますが、アンダースコアはなぜ（class _ = "comments"）ですか？それは変数か、まさに慣習ですか？ –

Beautiful SoupでPythonのhtml解析でxmlデータを使用する理想的な方法は何ですか？

答えて

関連する問題