このエラーはリクエストするURLがないために出されたものですが、理由を理解できません。Image scraper:urllib2.URLError:<urlopenエラーでホストが指定されていません>

私のコードは、4chan imgスクレーパーです。壁紙一般板であるボード "wg"以外は問題なく、すべてのボードで動作します。何らかの理由で、このボード上でのみ、画像を掻き集めるために次のページに行きません。エラー "urllib2.URLError:"を返します。



##@author klorox 

from bs4 import BeautifulSoup 
import requests 
import re 
import urllib2 
import os 
import collections 


# Gather our HTML source code from the pages 
def get_soup(url,header): 
    return BeautifulSoup(urllib2.urlopen(urllib2.Request(url, headers=header)), 'lxml') 

# Main logic function, we use this to re-iterate through the pages 
def main(url): 
    image_name = "image" 
    print url 
    header = {'User-Agent': 'Mozilla/5.0'} 
    r = requests.get(url) 
    html_content = r.text 
    soup = BeautifulSoup(html_content, 'lxml') 
    anchors = soup.findAll('a') 
    links = [a['href'] for a in anchors if a.has_attr('href')] 

# Grabs all the a anchors from the HTML source which contain our image links 
    def get_anchors(links): 
     for a in anchors: 
     return links 

# Gather the raw links and sort them   
    raw_links = get_anchors(links) 

# Parse out any duplicate links 
    def get_duplicates(arr): 
     dup_arr = arr[:] 
     for i in set(arr): 
     return list(set(dup_arr)) 

# Define our list of new links and call the function to parse out duplicates 
    new_elements = get_duplicates(raw_links) 

# Get the image links from the raw links, make a request, then write them to a folder. 
    def get_img():  
     for element in new_elements: 
      if ".jpg" in str(element) or '.png' in str(element) or '.gif' in str(element): 
       retries = 0 
       passed = False 
       while(retries < 3): 
         if "https:" not in element and "http:" not in element: 
          element = "http:"+element   
         raw_img = urllib2.urlopen(element).read() 
         cntr = len([i for i in os.listdir(dirr) if image_name in i]) + 1 
         print("Saving img: " + str(cntr) + " :  " + str(element) + " to: "+ dirr) 
         with open(dirr + image_name + "_"+ str(cntr)+".jpg", 'wb') as f: 
         passed = True 
        except urllib2.URLError, e: 
         retries += 1 
         print "Failed on", element, "(Retrying", retries, ")" 
       if not passed: 
        print "Failed on ", element, "skipping..." 

# Call our image writing function   

# Ask the user which board they would like to use 
print """Boards: [a/b/c/d/e/f/g/gif/h/hr/k/m/o/p/r/s/t/u/v/vg/vr/w/wg] [i/ic] [r9k] [s4s] [cm/hm/lgbt/y] [3/aco/adv/an/asp/biz/cgl/ck/co/diy/fa/fit/gd/hc/his/int/jp/lit/mlp/mu/n/news/out/po/pol/qst/sci/soc/sp/tg/toy/trv/tv/vp/wsg/wsr/x]""" 
print "\n" 
board = raw_input("Enter the board letter (Example: b, p, w): ") 
dirr = raw_input("Enter the working directory (USE DOUBLE SLASHES): (Example: C:\\\Users\\\Username\\\Desktop\\\Folder\\: ") 
# Define our starting page number and first try value   
page = 2 
firstTry = True 

# Check if this is the first iteration 
if firstTry == True: 
    url = "http://boards.4chan.org/"+board+"/" 
    firstTry = False 
    # After first iteration, this loop changes the url after each completed page by calling our main function again each time. 
    while page <= 10 and page >= 2 and firstTry == False: 
     firstTry == False 
     url = "http://boards.4chan.org/"+board+"/"+ str(page) +"/" 
     page = page + 1 
     p = page - 1 
     print("Page: " + str(p)) 



解決策は、try catch例外を使用して、httpまたはhttpsをチェックし、URLを適切にリダイレクトすることでした。このエラーは、恐らくサーバーの反Mass要求防止(おそらく仮定)によって引き起こされたものです。
