2017-09-05 8 views
2

いくつかのスニペットからテキスト(「325」と「550」)を抽出する必要があります。どうすればpython 3.6.0、bs4、urllibを使ってこの問題に取り掛かることができますか?取得したデータをcsvファイルに追加します。BeasutifulSoup4でのナビゲーション

<div class="a-row a-spacing-none"> 
    <a class="a-link-normal a-text-normal" href="https://www.amazon.in/Game-Thrones-Song-Ice-Fire/dp/0007428545"> 
     <span class="a-size-small a-color-secondary"> 
     </span> 

     <span class="a-size-base a-color-price s-price a-text-bold"> 

      <span class="currencyINR">   
      </span> 
     325 
     </span> 

    </a> 
    <span class="a-letter-space"> 
    </span> 

    <span aria-label='Suggested Retail Price: &lt;span class="currencyINR"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;550' class="a-size-small a-color-secondary a-text-strike"> 
     <span class="currencyINR">  
     </span> 
    550 
    </span> 

</div> 

私は、次のコードを使用して試してみましたが、その後、カントは、それに伴うspanタグを削除します。

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 


my_url = 'https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire' 
# opening up connection, grabbing thr page 

uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 


# html parsing 
page_soup = soup(page_html, "html.parser") 


# grabs each product 
containers = page_soup.findAll("div", {"class":"s-item-container"}) 
contain = containers[0] 
price = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"}) 
current_price = price[0].text.strip() 

答えて

0

手始めに、あなたはcurrencyINRクラスを持つすべてのspanの要素を選択することができます。

currency = contain.find('span', attrs={"class":"currencyINR"}) 

price = currency.nextSibling.strip() 
-1

私は後でこの問題を解決しました。私が傍受したようにそれを移動することは明らかに困難ではなかった。しかし、ここでは動作する解決策があります。

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 


my_url = "https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire" 


# opening up connection, grabbing the page 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 


# html parsing 
page_soup = soup(page_html, "html.parser") 


# grabs each product 
containers = page_soup.findAll("div", {"class":"s-item-container"}) 


# Creates New File: 
fileName = "H:\WEBSCRAPER\Result\Products.csv" 
headers = "Product Name, Current Price, Original Price\n" 

f = open(fileName, "w") 
f.write(headers) 


errorMsg = "Error! Not Found" 
# obtains the data 
for contain in containers: 
    try: 
     title = contain.h2.text 
    except IndexError: 
     title = errorMsg 
    try: 
     priceCurrent = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"}) 
     CurrentSP = priceCurrent[0].text.strip() 
    except IndexError: 
     CurrentSP = errorMsg 
    try: 
     priceSuggested = contain.findAll("span", {"class":"a-size-small a-color-secondary a-text-strike"}) 
     SuggestedSP = priceSuggested[0].text.strip() 
    except IndexError: 
     SuggestedSP = errorMsg 


    print("title: " + title) 
    print("CurrentSP: " + CurrentSP) 
    print("SuggestedSP: " + SuggestedSP) 

    f.write(title.replace(",", "|") + "," + CurrentSP.replace(",", "") + "," + SuggestedSP.replace(",", "") + "\n") 

f.close() 
関連する問題