美しいスープの「ナビゲート可能な文字列」と「タグ」からテキストを引き出す

批判スコアをタグとして、「％」を別に付けたロッテントマトのウェブサイトの解析に取りかかっています。私はfind_all('span',text="true")のようないくつかの提案に従っていましたが、Python 3.5.1シェルはこのエラーを返しました：私はBeautiful Soupオブジェクトcritiscoreの直接の子を見つけようとしましたが、同じエラーが発生しました。私が間違っていた場所を教えてください。スクレイピングで美しいスープの「ナビゲート可能な文字列」と「タグ」からテキストを引き出す

def get_rating(address): 
    """pull ratings numbers from rotten tomatoes""" 
    RTaddress = urllib.request.urlopen(address) 
    tomatoe = BeautifulSoup(RTaddress, "lxml") 
    for criticscore in tomatoe.find('span', class_=['meter-value superPageFontColor']): 
     print(''.join(criticscore.find_all('span', recursive=False))) #print the Tomatometer

はまた、ここに私は興味が腐ったトマト上のコードです：ここに私のpythonのコードです

<div class="critic-score meter"> 
         <a href="#contentReviews" class="unstyled articleLink" id="tomato_meter_link"> 
          <span class="meter-tomato icon big medium-xs certified_fresh pull-left"></span> 
          <span class="meter-value superPageFontColor"><span>96</span>%</span> 
         </a> 
        </div>

出典

2016-11-22 st4rgut

問題行はこの1つである：ここでは

for criticscore in tomatoe.find('span', class_=['meter-value superPageFontColor']):

、 find()を介して単一の要素を探していて、テキストノードや他の要素となる子要素を反復処理します（要素を反復処理するときはBeautifulSoup）。

for criticscore in tomatoe.find_all('span', class_=['meter-value superPageFontColor']):

それとも、あなたの代わりに、単一のCSS selectorを使用することができます：代わりに、あなたはおそらく代わりにfind()のfind_all()を使用することを意図し

for criticscore in tomatoe.select('span.meter-value > span'): 
    print(criticscore.get_text())

>は（直接の親子関係を意味し、これはあなたのrecursive=Falseです）。

出典

2016-11-22 18:41:07 alecxe

私はfind_allを使っていましたが、pythonは私が必要としなかったのと同じクラスの別の評価をプリントアウトしました。 find_all（）を使って最初の評価を印刷するか、find（）を使って評価とパーセント記号を出力する方法はありますか？ – st4rgut

ありがとう、私はあなたのCSSセレクターを使用し、リストの最初の要素を印刷 – st4rgut

美しいスープの「ナビゲート可能な文字列」と「タグ」からテキストを引き出す

答えて

関連する問題