Scrapyとエラー404が、urllib2の

と私は、次のWebサイトhttps://www.shopee.sgスクラップしたくない：Scrapyとエラー404が、urllib2の

~$ scrapy shell https://www.shopee.sg

をしかし、私は404エラーました：

：urllib2のは、この同じURLを開くことができますが

[s] request <GET https://www.shopee.sg> 
[s] response <404 https://shopee.sg/>

は、

import urllib2 
response = urllib2.urlopen('https://www.shopee.sg') 
print len(response.read())

ショー：

出典

2017-10-20 user1836529

ウェブサイトがユーザーエージェントの文字列を調べ、Scrapyをブロックしているようです。たとえばに設定した場合クロムユーザーエージェント文字列USER_AGENTを使用して、それは動作します：

scrapy shell -s USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36" "https://www.shopee.sg"

出典

2017-10-21 07:15:46

Scrapyとエラー404が、urllib2の

答えて

関連する問題