複数のWebページをPythonで削る

from bs4 import BeautifulSoup 
import urllib, time 
class scrap(object): 
    def __init__(self): 
     self.urls = ['https://www.onthemarket.com/for-sale/property/wigan/', 'https://www.onthemarket.com/for-sale/property/wigan/?page=1', 'https://www.onthemarket.com/for-sale/property/wigan/?page=2', 'https://www.onthemarket.com/for-sale/property/wigan/?page=3', 'https://www.onthemarket.com/for-sale/property/wigan/?page=4', 'https://www.onthemarket.com/for-sale/property/wigan/?page=6'] 
     self.telephones = [] 
    def extract_info(self): 
     for link in self.urls: 
      data = urllib.request.urlopen(link).read() 
      soup = BeautifulSoup(data, "lxml") 
      for tel in soup.findAll("span", {"class":"call"}): 
       self.telephones.append(tel.text.strip()) 
      time.sleep(1) 
     return self.telephones 

to = scrap() 
print(to.extract_info())

何が問題なのですか？このコードは2番目のウェブサイトの後についています。リスト内の各ウェブページから電話番号を抽出する必要がありますself.urls複数のWebページをPythonで削る

出典

2017-12-04 FootAdministration

あなたはすべてのエラーを取得している場合は、私はあなたのコードを試してみただけでなく – csharpcoder

それを投稿してください、すべてが正常に動作します。 [9.3秒で終了] – ventik

エラーはありません。 Pythonシェルは作業をしていますが、何も返さないのです。私はPython 3.6でSpyderを使用します。私は5分以上待っていて何も起こらない。 – FootAdministration

あなたのリクエストパラメータにheadersを入れて行ってください。これを試してみてください：

from bs4 import BeautifulSoup 
import requests, time 

class scrape(object): 

    def __init__(self): 
     self.urls = ['https://www.onthemarket.com/for-sale/property/wigan/', 'https://www.onthemarket.com/for-sale/property/wigan/?page=1', 'https://www.onthemarket.com/for-sale/property/wigan/?page=2', 'https://www.onthemarket.com/for-sale/property/wigan/?page=3', 'https://www.onthemarket.com/for-sale/property/wigan/?page=4', 'https://www.onthemarket.com/for-sale/property/wigan/?page=6'] 
     self.telephones = [] 

    def extract_info(self): 
     for link in self.urls: 
      data = requests.get(link,headers={"User-Agent":"Mozilla/5.0"}) #it should do the trick 
      soup = BeautifulSoup(data.text, "lxml") 
      for tel in soup.find_all("span",{"class":"call"}): 
       self.telephones.append(tel.text.strip()) 
      time.sleep(1) 
     return self.telephones 

crawl = scrape() 
print(crawl.extract_info())

出典

2017-12-04 10:53:08 SIM

あなたのケースでは、2つのサイトが動作していることがわかりました。残りは私の場合、空のリストです。しかし、リクエストパラメータにヘッダを入れた後は、完全に@FootAdministrationで動作します。 – SIM

ありがとうShahinは私のために働いた！素晴らしい答え！良い一日を！ – FootAdministration

複数のWebページをPythonで削る

答えて

関連する問題