urllib2のpythonを使用していくつかのURLに禁止403をアクセスしました

NSEインドのウェブサイトからデータをダウンロードしようとしています。ダウンロードするデータは、ダウンロード後に処理するzipファイルです。私は、URLを使用する場合、私は、今年上記のコードで2016urllib2のpythonを使用していくつかのURLに禁止403をアクセスしました

def start_download(): 

    directory = 'data' 
    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) ' 
         'Chrome/23.0.1271.64 Safari/537.11', 
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 
      'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 
      'Accept-Encoding': 'none', 
      'Accept-Language': 'en-US,en;q=0.8', 
      'Connection': 'keep-alive'} 
    try: 
     #req = urllib2.Request("https://www.nseindia.com/content/historical/EQUITIES//2000/JAN/cm01JAN2000bhav.csv.zip", headers=hdr) 
     import ipdb;ipdb.set_trace() 
     req = urllib2.Request("https://www.nseindia.com/content/historical/EQUITIES//2017/NOV/cm03NOV2017bhav.csv.zip", headers=hdr) 
     file_url = urllib2.urlopen(req) 
     try: 
      if not os.path.exists(directory): 
       os.makedirs(directory) 
      file_name_obj = open(os.path.join(directory, "hello.zip"), 'wb') 
      file_name_obj.write(file_url.read()) 
      file_name_obj.close() 
     except IOError, e: 
      print e 
    except Exception, e: 
     print e

後の日付のファイルをダウンロードしたサンプルコードを持っている「https://www.nseindia.com/content/historical/EQUITIES//2017/NOV/cm03NOV2017bhav.csv.zip」、それは、データをダウンロードします。私もPostmanクライアントを使ってみましたが、それもダウンロードします。

私は次のURLを使用すると：https://www.nseindia.com/content/historical/EQUITIES//2000/JAN/cm01JAN2000bhav.csv.zip、私は郵便配達員だけでなくコードでも403エラーを禁じます。このリンクをクロムブラウザに貼り付けると、問題もあります。このページ「https://www.nseindia.com/products/content/equities/equities/archieve_eq.htm」からのリンクを通過し、2000年1月1日としてBhavcopyとしてReportとdateを入れたときに

はしかし、それは成功したファイル*の.csv.zipをダウンロードします。

サンプルコードでコメント403のこの403禁止エラーを解決するにはどうすればよいですか？

出典

2017-11-04 Ajay Tanpure

ヘッダーを調整する必要があります。ここはそれを行う方法の一例であり、どのようにPythonのを使ってダウンロードしたファイルを書き込む：このメソッドを使用して、ダウンロードしたファイルは、あなたにダウンロードしたファイルの同じSHA1サムを持っている場合

from urllib.request import Request, urlopen 
import shutil 

link = 'https://www.nseindia.com/content/historical/EQUITIES//2017/NOV/cm03NOV2017bhav.csv.zip' 
header = { 
    'Accept-Encoding': 'gzip, deflate, sdch, br', 
    'Accept-Language': 'fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4', 
    'Host': 'www.nseindia.com', 
    'Referer': 'https://www.nseindia.com/', 
    'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/53.0.2785.143 Chrome/53.0.2785.143 Safari/537.36', 
    'X-Requested-With': 'XMLHttpRequest' 
} 

def download_file(link, file_name, length): 
    try: 
     req = Request(link, headers=header) 
     with open(file_name, 'wb') as writer: 
      request = urlopen(req, timeout=3) 
      shutil.copyfileobj(request, writer, length) 
    except Exception as e: 
     print('File cannot be downloaded:', e) 
    finally: 
     print('File downloaded with success!') 

file_name = 'new_file.zip' 
length = 1024 
download_file(link, file_name, length)

最後に、あなたは確認することができますブラウザ：

ファイルは、Pythonを使用してダウンロード：

> sha1sum cm03NOV2017bhav.csv.zip 
daff49646d183636f590db6cbf32c93896179cb2 cm03NOV2017bhav.csv.zip

：

> sha1sum new_file.zip 
daff49646d183636f590db6cbf32c93896179cb2 new_file.zip

ファイルには、クロムを使用してダウンロード

出典

2017-11-04 07:05:39

urllib2のpythonを使用していくつかのURLに禁止403をアクセスしました

答えて

関連する問題