2017-08-30 6 views
1

Pythonのスクレーパーが唯一の1つのアイテムをもたらします...私は、Pythonの比較的新しいだとも私は私の国の機密ページの1をスクラップするスクリプトを作っ

こんにちは皆をスクラップ。これまでのところ、スクリプトは1週間だけ修正しようとしていたので、本当に私を苦しめている1つのアイテムしか掴むことができないようです。誰かが見て、ここで私が何をすべきかを私に説明しようとするならば、私はそれを感謝します。助けることができる誰にも事前に感謝します!Pythonスクリプトは、唯一つの項目(分類ページ)

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

my_url = 'http://www.clasificadosonline.com/UDMiscListingID.asp?MiscCat=75' 

# opening ip connection, grabbing the page 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

#HTML PARSER 
page_soup = soup(page_html, "html5lib") #se cambio de "html.parser" a "html5lib por que jodia el closing form tag" 

containers = page_soup.findAll("form",{"name":"listing"}) 

#testing variables 
tags = containers[0].findAll("a", {"class":"Tahoma16Blacknounder"}) 
tagx = tags[0].text.strip() 

filename = "products.csv" 
f = open(filename, "w") 

headers = "names, prices, city, product_condition\n" 

f.write(headers) 

for container in containers: 
#holds the names of the classifieds 
names_container = container.findAll("a", {"class":"Tahoma16Blacknounder"}) 
names = names_container[0].text.strip() # comment here later 

#the span class"Tahoma14BrownNound" seems to hold the prices 
#container.findAll("span", {"class":"Tahoma14BrownNound"}) 
#the span class 
prices_container = container.findAll("span", {"class":"Tahoma14BrownNound"}) 
prices = prices_container[0].text # comment here later 

#holds the city of use of the products 
city_container = container.findAll("font", {"class":"tahoma14hbluenoUnder"}) 
city = city_container[0].text.strip() # comment here later 

#holds the states of use of the products 
product_condition_container = container.findAll("span", {"class":"style14 style15 style16"}) 
product_condition = product_condition_container[0].text # comment here later 

print("names: " + names) 
print("prices: " + prices) 
print("city: " + city) 
print("product_condition: " + product_condition) 

f.write(names.replace(",", "|") + "," + prices + "," + city + "," + product_condition + "\n") 

f.close() 

答えて

0

私は、サイトの構造を見て、あなたは、フォームの後に、テーブルの解析を逃しています。

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

my_url = 'http://www.clasificadosonline.com/UDMiscListingID.asp?MiscCat=75' 

# opening ip connection, grabbing the page 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

#HTML PARSER 
page_soup = soup(page_html, "html5lib") #se cambio de "html.parser" a "html5lib por que jodia el closing form tag" 

containers = page_soup.findAll("form",{"name":"listing"}) 

#testing variables 
tags = containers[0].findAll("a", {"class":"Tahoma16Blacknounder"}) 
tagx = tags[0].text.strip() 

filename = "products.csv" 
f = open(filename, "w") 

headers = "names, prices, city, product_condition\n" 

f.write(headers) 

tr = containers[0].findAll('tr', {"valign":"middle"}) 

for container in tr: 

if len(container.findAll("a", {"class":"Tahoma16Blacknounder"})) > 0: 
    #holds the names of the classifieds 
    names_container = container.findAll("a", {"class":"Tahoma16Blacknounder"}) 
    names = names_container[0].text.strip() # comment here later 

    #the span class"Tahoma14BrownNound" seems to hold the prices 
    #container.findAll("span", {"class":"Tahoma14BrownNound"}) 
    #the span class 
    prices_container = container.findAll("span", {"class":"Tahoma14BrownNound"}) 
    prices = prices_container[0].text if len(prices_container) > 0 else '' 

    #holds the city of use of the products 
    city_container = container.findAll("font", {"class":"tahoma14hbluenoUnder"}) 
    city = city_container[0].text.strip() # comment here later 

    #holds the states of use of the products 
    product_condition_container = container.findAll("span", {"class":"style14 style15 style16"}) 
    product_condition = product_condition_container[0].text # comment here later 

    print("names: " + names) 
    print("prices: " + prices) 
    print("city: " + city) 
    print("product_condition: " + product_condition) 

f.write(names.replace(",", "|") + "," + prices + "," + city + "," + product_condition + "\n") 

f.close() 
+0

私はあなたに感謝しています。 –

+0

素晴らしい!これを返信としてマークしてください!ありがとう! – chad

関連する問題