次のページに問題が発生しました

特定のサイトからリンクを取得するために関数を実行すると、最初のページからリンクが取得されますが、次のページに移動する代わりに、次のエラー。次のページに問題が発生しました

クローラ：

import requests 
from lxml import html 

def Startpoint(mpage): 
    page=4 
    while page<=mpage: 
     address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html" 
     tail="https://www.katalystbusiness.co.nz/business-profiles/" 
     page = requests.get(address) 
     tree = html.fromstring(page.text) 
     titles = tree.xpath('//p/a/@href') 
     for title in titles: 
      if "bindex" not in title: 
       if "cdn-cgi" not in title: 
        print(tail + title) 


    page+=1 

Startpoint(5)

エラーメッセージ：

Traceback (most recent call last): 
    File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 19, in <module> 
    Startpoint(5) 
    File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 6, in Startpoint 
    while page<=mpage: 
TypeError: unorderable types: Response() <= int()

出典

2017-04-21 SIM

あなたはrequests.get(address)pageの結果を代入しています。そして、Pythonはrequests.Responseオブジェクトをintと比較することはできません。 responseのようなものをpageと呼ぶだけです。また、最後の行にインデントエラーがあります。

import requests 
from lxml import html 

def Startpoint(mpage): 
    page=4 
    while page<=mpage: 
     address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html" 
     tail="https://www.katalystbusiness.co.nz/business-profiles/" 
     response = requests.get(address) 
     tree = html.fromstring(response.text) 
     titles = tree.xpath('//p/a/@href') 
     for title in titles: 
      if "bindex" not in title: 
       if "cdn-cgi" not in title: 
        print(tail + title) 


     page+=1 

Startpoint(5)

出典

2017-04-21 17:27:23 bernie

あなたの鋭い応答に感謝しています。それは魔法のように機能します。このサイトで私がそうすることができるとき、あなたの答えを受け入れるつもりです。再度、感謝します。 – SIM

大歓迎です！あなたにハッピーコーディング。 – bernie

何か伝説の間違いがありましたが、私の頭はスピンしていました。だからこそコーディングをその場で行うべきではありません。もう一度ありがとう、ベルニー。 – SIM

あなたがライン上でpage変数を上書きしている：だからpage = requests.get(address)

それが2回目の繰り返し上while page<=mpage:に戻って取得するとき、（応答オブジェクト今である）pageを比較しようとしていますmpage（整数）

また、page+=1は、whileループ内にある必要があります。

出典

2017-04-21 17:27:39 Stacktrace

次のページに問題が発生しました

答えて

関連する問題