2017-11-09 12 views
0

私は以下のコードを持っています。皆様の助けを借りてとてもうまく動作します。私は私が持っている質問に答える関連するスレッドを検索しようとしましたが、見つけられませんでしたのでここに行きます。複数サイトのWebスクラップ - Python

このコードに複数のサイトを追加して、適切にcsvファイルに出力するにはどうすればよいですか?

ここに私が追加したいサイトがいくつかあります(余分なものは3つ以上あります)。ありがとうございます。以下は

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL'

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL'

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL'

はコードです:ここでは

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 


#setting my_url to the wesite 
my_url = 'https://www.publicstorage.com/north-carolina/self-storage- 
charlotte-nc/28206-self-storage/2334? 
lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1' 

#Opening up connection, grabbing the page 
uClient = uReq(my_url) 

#naming uClient to page_html 
page_html = uClient.read() 

#closing uClient 
uClient.close() 

#this does my html parsing 
page_soup = soup(page_html, "html.parser") 

#setting container to capture where the actual info is using inspect element 
#grabs each product 
containers = page_soup.findAll("li",{"class":"srp_res_row plp"}) 
store_locator = page_soup.findAll("div", {"itemprop":"address"}) 

filename = "product.csv" 
f = open(filename, "w") 

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, 
street_address, store_city\n" 

f.write(headers) 

for container in containers: 
    for store_location in store_locator: 
     street_address = store_location.findAll("span", 
{"itemprop":"streetAddress"}) 
     store_city = store_location.findAll("span", 
{"itemprop":"addressLocality"}) 
    title_container = container.div.div 
    unit_size = title_container.text 
    size_dim = container.findAll("div", {"class":"srp_label srp_font_14"}) 
    unit_container = container.li 
    unit_type = unit_container.text 
    online_price = container.findAll("div", {"class":"srp_label alt-price"}) 
    reg_price = container.findAll("div", {"class":"reg-price"}) 


    for item in zip(unit_size,size_dim,unit_container,online_price,reg_price,street_address,stor 
e_city): 
     csv=item[0] + "," + item[1].text + "," + item[2] + "," + 
item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text 
+ "\n" 
     f.write(csv) 

は、HTMLスクリプトです。

<li class="srp_res_row plp"> 
 
    <div class="srp_res_clm srp_clm160"> 
 
     <div class="srp_label plp">Small</div> 
 
     <div class="srp_v-space_3"></div> 
 
     <div class="srp_label srp_font_14" style="padding-left: 5px;">5' x 10'</div> 
 
     <div class="srp_v-space_3"></div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm120"> 
 
     <ul class="srp_list"> 
 
      <li>Outside unit/Drive-up access</li> 
 
     </ul> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm90"> 
 
     <div class="srp_label">$1<span class="srp_label_symbol">†</span></div> 
 
     <div class="srp_v-space_10">1st Month</div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm90"> 
 
     <div class="srp_label alt-price">$56/mo.</div> 
 
     <div class="online-special">Online Special<span class="srp_label_symbol">†</span></div> 
 
     <div class="srp_v-space_15"></div> 
 
     <div class="reg-price">$70 In-store</div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm100 srp_vcenter"><a class="srp_continue unit-no-deposit" data-deposit-amount="0" data-deposit-days="0" data-features="Outside unit/Drive-up access" data-marketing-size="5x10" data-ppk="altproduct_price" data-promotionid="132" data-siteid="2334" data-size-description="5' x 10'" data-sizeid="613573" data-wc2-unit="false" href="/ReservationDetails.aspx?st=2334&amp;sz=613573&amp;key=[rnd]&amp;location=&amp;plp=1&amp;rk=&amp;ismi=1&amp;sp=Charlotte%7c35.2270869%7c-80.8431267&amp;clp=1"><img alt="Continue" src="/images/srp-cont-new-80.png" style="width: 80px; height: 32px"/></a></div> 
 
</li>

+0

あなたは、各オーバーリストおよびループ内のURLを保存することができますURLを入力し、CSVをスクラップして保存します。 – Ali

+0

@Ali - すばやい返信をいただきありがとうございます。これをどうやって私に教えてもらえますか? –

+0

以下の回答を参照してください。 – Ali

答えて

0

コード:

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

# setting my_url to the wesite 
urls = ['https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28206-self-storage/2334?lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL'] 

filename = "product.csv" 
open(filename, 'w').close() 
f = open(filename, "a") 
num = 0 

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city\n" 

f.write(headers) 

for my_url in urls: 
    # Opening up connection, grabbing the page 
    uClient = uReq(my_url) 

    # naming uClient to page_html 
    page_html = uClient.read() 

    # closing uClient 
    uClient.close() 

    # this does my html parsing 
    page_soup = soup(page_html, "html.parser") 

    # setting container to capture where the actual info is using inspect element 
    # grabs each product 
    containers = page_soup.findAll("li", {"class": "srp_res_row plp"}) 
    store_locator = page_soup.findAll("div", {"itemprop": "address"}) 

    f.write("website " + str(num) + ": \n") 
    for container in containers: 
     for store_location in store_locator: 
      street_address = store_location.findAll("span", {"itemprop": "streetAddress"}) 
      store_city = store_location.findAll("span", {"itemprop": "addressLocality"}) 
      title_container = container.div.div 
      unit_size = title_container.text 
      size_dim = container.findAll("div", {"class": "srp_label srp_font_14"}) 
      unit_container = container.li 
      unit_type = unit_container.text 
      online_price = container.findAll("div", {"class": "srp_label alt-price"}) 
      reg_price = container.findAll("div", {"class": "reg-price"}) 

     for item in zip(unit_size, size_dim, unit_container, online_price, reg_price, street_address, store_city): 
      csv = item[0] + "," + item[1].text + "," + item[2] + "," + item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text + "\n" 
      f.write(csv) 
    num += 1 

が出力(product.csvの内容):

unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city 
website 0: 
S,5' x 10',Outside unit/Drive-up access,$55/mo.,$68 In-store,1001 N Tryon St,Charlotte 
M,5' x 15',Outside unit/Drive-up access,$68/mo.,$84 In-store,1001 N Tryon St,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$101/mo.,$126 In-store,1001 N Tryon St,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$154/mo.,$187 In-store,1001 N Tryon St,Charlotte 
L,10' x 25',Outside unit/Drive-up access,$167/mo.,$208 In-store,1001 N Tryon St,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$172/mo.,$209 In-store,1001 N Tryon St,Charlotte 
L,15' x 20',Outside unit/Drive-up access,$193/mo.,$241 In-store,1001 N Tryon St,Charlotte 
website 1: 
S,5' x 5',Outside unit/Drive-up access,$50/mo.,$60 In-store,3710 Monroe Road,Charlotte 
S,5' x 10',Outside unit/Drive-up access,$53/mo.,$66 In-store,3710 Monroe Road,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$55/mo.,$68 In-store,3710 Monroe Road,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$97/mo.,$118 In-store,3710 Monroe Road,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$100/mo.,$124 In-store,3710 Monroe Road,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$128/mo.,$159 In-store,3710 Monroe Road,Charlotte 
M,10' x 10',Climate Controlled,$129/mo.,$157 In-store,3710 Monroe Road,Charlotte 
L,20' x 30',Outside unit/Drive-up access,$292/mo.,$356 In-store,3710 Monroe Road,Charlotte 
website 2: 
S,5' x 10',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte 
S,5' x 5',Outside unit/Drive-up access,$42/mo.,$53 In-store,5301 N Sharon Amity Rd,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$80/mo.,$99 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$87/mo.,$108 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$100/mo.,$124 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 10',Outside unit/Drive-up access,$100/mo.,$125 In-store,5301 N Sharon Amity Rd,Charlotte 
M,10' x 10',Climate Controlled,$112/mo.,$139 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 25',Outside unit/Drive-up access,$121/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 10',Climate Controlled,$123/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 20',Outside unit/Drive-up access,$135/mo.,$168 In-store,5301 N Sharon Amity Rd,Charlotte 
website 3: 
S,3' x 3',Inside unit/1st Floor,$17/mo.,$22 In-store,4730 N Tryon St,Charlotte 
S,5' x 5',Outside unit/Drive-up access,$35/mo.,$43 In-store,4730 N Tryon St,Charlotte 
S,5' x 10',Outside unit/Drive-up access,$39/mo.,$49 In-store,4730 N Tryon St,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$40/mo.,$50 In-store,4730 N Tryon St,Charlotte 
M,5' x 15',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte 
M,20' x 5',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$66/mo.,$82 In-store,4730 N Tryon St,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$84/mo.,$105 In-store,4730 N Tryon St,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$136/mo.,$169 In-store,4730 N Tryon St,Charlotte 
+0

@ Ali - ありがとう! –

関連する問題