2016-06-27 26 views
1

私はbeautifulsoupを使って、製品名、説明、価格、画像についてneweggを削っていました。次のbs4.element.Tagタイプがあり、タグから "src"リンクを抽出したいと思います。以下は、私のタグです:Beautifulsoupタグからsrcを抽出する

df = <a class="itemImage" href="http://www.newegg.com/Product/Product.aspx?Item=N82E16875169194&amp;cm_re=Samsung_edge-_-75-169-194-_-Product" id="img_75-169-194" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'>\n<img alt='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty' src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'/>\n</a> 

私はこのタグから

src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg" 

を抽出できますか?試しました

df.attrs['src'] 

私はKeyerrorを受け取りました。

答えて

1

srcがIMGのタグです:

from bs4 import BeautifulSoup 
tag = """<a class="itemImage" href="http://www.newegg.com/Product/Product.aspx?Item=N82E16875169194&amp;cm_re=Samsung_edge-_-75-169-194-_-Product" id="img_75-169-194" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'>\n<img alt='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty' src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'/>\n</a>""" 

soup = BeautifulSoup(tag,"lxml") 

src = soup.img["src"] 

あなたを与えるもの:

http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg 
-1

は 参照Pythonで正規表現を試してみてください
https://docs.python.org/2/library/re.html

import re 
s = """ 
    <a class="itemImage" href="http://www.newegg.com/Product/Product.aspx?Item=N82E16875169194&amp;cm_re=Samsung_edge-_-75-169-194-_-Product" id="img_75-169-194" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'>\n<img alt='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty' src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'/>\n</a> 
    """ 
src_list = re.findall("src=[^\s]*", s) 

出力:

src_list = ['src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg"'] 
関連する問題