URLが終了しないときにこのコードの一部を使用してimgurからイメージを取得しました。 .png/.jpgのような画像の拡張子。しかし、私はこれらのエラーを取得しています。見ていると修正を提案してください:image_url = soup.select( '。イメージa')[0] ['href'] IndexError:リストのインデックスが範囲外になっています
import datetime
import praw
import re
import urllib
import requests
from bs4 import BeautifulSoup
sub = 'dog'
imgurUrlPattern = re.compile(r'(http://i.imgur.com/(.*))(\?.*)?')
r = praw.Reddit(user_agent = "download all images from a subreddit",
user_site = "lamiastella")
already_done = []
#checkWords = ['i.imgur.com', 'jpg', 'png',]
check_words = ['jpg', 'png']
subreddit = r.get_subreddit(sub)
for submission in subreddit.get_hot(limit=10000):
is_image = any(string in submission.url for string in check_words)
print '[LOG] Getting url: ' + submission.url
if submission.id not in already_done and is_image:
if submission.url.endswith('/'):
modified_url = submission.url[:len(submission.url)-1]
try:
urllib.urlretrieve(modified_url, '/home/jalal/computer_vision/image_retrieval/images/' + datetime.datetime.now().strftime('%y-%m-%d-%s') + modified_url[-4:])
except IOError:
pass
else:
try:
urllib.urlretrieve(submission.url, '/home/jalal/computer_vision/image_retrieval/images/' + datetime.datetime.now().strftime('%y-%m-%d-%s') + submission.url[-4:])
except IOError:
pass
already_done.append(submission.id)
print '[LOG] Done Getting ' + submission.url
print('{0}: {1}'.format('submission id is', submission.id))
elif 'http://imgur.com/' in submission.url:
# This is an Imgur page with a single image.
html_source = requests.get(submission.url).text # download the image's page
soup = BeautifulSoup(html_source, "lxml")
image_url = soup.select('.image a')[0]['href']
if image_url.startswith('//'):
# if no schema is supplied in the url, prepend 'http:' to it
image_url = 'http:' + image_url
image_id = image_url[image_url.rfind('/') + 1:image_url.rfind('.')]
urllib.urlretrieve(image_url, '/home/jalal/computer_vision/image_retrieval/images/' + datetime.datetime.now().strftime('%y-%m-%d-%s') + image_url[-4:])
エラーは次のとおりです。
[LOG] Getting url: http://imgur.com/a/yOLjm
Traceback (most recent call last):
File "download_images.py", line 43, in <module>
image_url = soup.select('.image a')[0]['href']
IndexError: list index out of range
使用 'プリント(スープを必要とします.select( '。image a')) ' - おそらく空のリストを取得するので、[0]要素も取得できません。 – furas
あなたは '.image img'と' src' - 'soup.select( '。image img')[0] ['src']' – furas