urllib.error.HTTPError：禁止されたHTTPエラー403：

-2

特定のページを掻き集めるときに "urllib.error.HTTPError：HTTP Error 403：Forbidden"というエラーが表示され、hdr = {"User-Agent': 'Mozilla/5.0"}のようなものをヘッダに追加することが解決策ですこの。urllib.error.HTTPError：禁止されたHTTPエラー403：

ただし、私がスクレイプしようとしているURLが別のソースファイルにある場合は、動作させることはできません。どのように/下のコードにUser-Agentを追加できますか？

from bs4 import BeautifulSoup 
import urllib.request as urllib2 
import time 

list_open = open("source-urls.txt") 
read_list = list_open.read() 
line_in_list = read_list.split("\n") 

i = 0 
for url in line_in_list: 
    soup = BeautifulSoup(urllib2.urlopen(url).read(), 'html.parser') 
    name = soup.find(attrs={'class': "name"}) 
    description = soup.find(attrs={'class': "description"}) 
    for text in description: 
     print(name.get_text(), ';', description.get_text()) 
#  time.sleep(5) 
    i += 1

感謝:)

出典

2017-01-07 Espen

あなたは 'urllib'文書を読んだことがありますか？あるいは、['requests']（http://docs.python-requests.org/ja/master/）のようにユーザーフレンドリーなものを使用していますか？ – MattDMo

はい、私はまだ動作しません。変数 'hdr = {" User-Agent '：' Mozilla/5.0 "}'を追加し、スープラインを 'soup = BeautifulSoup（urllib2。 'urlopen（url、headers = hdr）.read（）、 'html.parser'）' Pythonは、 'headers'という単語に予期せぬアトラクションを与えてくれます。何か案が？ありがとう – Espen

あなたは私のコメントを読まなかった。 ** 1。**質問をする前に[関連するドキュメント]（https://docs.python.org/2/library/urllib2.html#urllib2.urlopen）を読んでください。この場合、関数にはヘッダはありません'パラメータ。 ** 2。**私が言ったように、[docs]（https://docs.python.org/2/library/urllib2.html）のように、代わりに 'requests'を使うべきです。依頼がstd libにない唯一の理由は、依然として積極的な開発が行われており、メンテナーがPythonのリリーススケジュールに依存したくないからです。これを使って。あなたの人生はより簡単になります。 – MattDMo

あなたはrequests

import requests 
hdrs = {'User-Agent': 'Mozilla/5.0 (X11 Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}  
for url in line_in_list: 
    resp = requests.get(url, headers=hdrs) 
    soup = BeautifulSoup(resp.content, 'html.parser') 
    name = soup.find(attrs={'class': "name"}) 
    description = soup.find(attrs={'class': "description"}) 
    for text in description: 
     print(name.get_text(), ';', description.get_text()) 
#  time.sleep(5) 
    i += 1

を使用して、同じ達成することができますが、それがお役に立てば幸い！

出典

2017-01-08 05:18:36

あなたは私の一日を保存しました、ありがとう！ – Espen

urllib.error.HTTPError：禁止されたHTTPエラー403：

答えて

関連する問題