Pythonスクレイパーは、その活動の中でTimeoutErrorとWinErrorを表示します

私はPythonスクリプトを実行すると、1〜2ページを擦って突然表示されていることがわかります。[TimeoutError：[WinError 10060]接続されたパーティー一定時間後に正しく応答しなかったか、接続されたホストが応答しなかったため接続が確立されなかった]私は、ウェブサイトがコンテンツを表示するのが非常に遅いことに気付くことができました。とにかく、私は回避策があることを願っています。前もって感謝します。Pythonスクレイパーは、その活動の中でTimeoutErrorとWinErrorを表示します

import requests 
from lxml import html 

def Startpoint(mpage): 
    leaf=1 
    while leaf<=mpage: 
     link="http://www.austrade.gov.au/" 
     address = "http://www.austrade.gov.au/suppliersearch.aspx?smode=AND&ind=Agribusiness%7c%7cArts+%26+Recreation%7c%7cBuilding+%26+Construction%7c%7cBusiness+%26+Other+Services%7c%7cConsumer+Goods%2c+Non-Food%7c%7cDefence%2c+Security+%26+Safety%7c%7cEducation+%26+Training%7c%7cEnvironment+%26+Energy%7c%7cFinance+%26+Insurance%7c%7cFood+%26+Beverage%7c%7cGovernment%7c%7cHealth%2c+Biotechnology+%26+Wellbeing%7c%7cICT%7c%7cManufacturing+(Other)%7c%7cMining%7c%7cTourism+%26+Hospitality%7c%7cTransport&folderid=1736&pg=" + str(leaf) 
     try : 
      page = requests.get(address, timeout=30) 
     except requests.exceptions.ReadTimeout: 
      print('timed out') 
      continue 
     page = requests.get(address) 
     tree = html.fromstring(page.text) 
     titles=tree.xpath('//a[@class="Name"]') 
     for title in titles: 
      href = link + title.xpath('./@href')[0] 
      Endpoint(href) 
     leaf+=1 

def Endpoint(address): 
    try : 
     page = requests.get(address, timeout=30) 
    except requests.exceptions.ReadTimeout: 
     print('timed out') 
    else : 
     tree=html.fromstring(page.text) 
     titles = tree.xpath('//div[@class="contact-details block dark"]') 
     for title in titles: 
      try : 
       Name=title.xpath('.//p[1]/text()')[0] if len(title.xpath('.//p[1]/text()'))>0 else None 
       Name1=title.xpath('.//p[3]/text()')[0] if len(title.xpath('.//p[3]/text()'))>0 else None 
       Metco=(Name,Name1) 
       print(Metco) 
      except: 
       continue 

Startpoint(10)

出典

2017-04-26 SIM

あなたはタイムアウト例外をキャッチし、あなたの答えのために、あなたのスクリプト

try : 
    page = requests.get(address, timeout=30) # set the max timeout , eg 30 seC# 
except requests.exceptions.ReadTimeout : 
    print('timed out') 
except Exception as ex : 
    print(type(ex).__name__)

出典

2017-04-26 15:46:28

おかげで先生のt.m.adamの実行を続けることができる：ここでは完全なコードです。私はそれが適切にインデントされるときに特別にコーディングするのが得意ではありません。私はあなたが示唆したことを試みましたが、私の能力不足のためにあなたの指導に従うことができません。あなたの検討のために、私はあなたの答えに続いて私のポストのコードを変更しましたが、最初の "続行"行でインデントエラーが発生しました。私が間違ってやった間違いがたくさんあるかもしれません。ありがとう – SIM

私はあなたのコードを編集します –

完了し、ちょうど編集とコピーを受け入れてください –

Pythonスクレイパーは、その活動の中でTimeoutErrorとWinErrorを表示します

答えて

関連する問題