python標準ライブラリ

-2

を使用してHTMLファイルから画像を抽出するので、基本的にHTMLファイルを解析し、すべての画像を見つけて別のフォルダに保存するスクリプトを作成しようとしています。あなたのコンピュータにpython3をインストールしたときに付属するライブラリを使って、これをどうすれば実現できますか？私は現在、このスクリプトを持っています。python標準ライブラリ

date = datetime.date.today() 
backup_path = os.path.join(str(date), language) 
if not os.path.exists(backup_path): 
    os.makedirs(backup_path) 

log = [] 

endpoint = zendesk + '/api/v2/help_center/en-us/articles.json' 
while endpoint: 
    response = requests.get(endpoint, auth=credentials) 
if response.status_code != 200: 
    print('Failed to retrieve articles with error {}'.format(response.status_code)) 
    exit() 
data = response.json() 

for article in data['articles']: 
    if article['body'] is None: 
     continue 
    title = '<h1>' + article['title'] + '</h1>' 
    filename = '{id}.html'.format(id=article['id']) 
    with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f: 
     f.write(title + '\n' + article['body']) 

    print('{id} copied!'.format(id=article['id'])) 

    log.append((filename, article['title'], article['author_id'])) 

endpoint = data['next_page']

これは基本的にZendeskに関する私たちの記事をバックアップするzendeskフォーラムで見つかったスクリプトです。

出典

2017-06-09 humbleCoder

美しいスープを使ってみませんか？ – JakeD

あなたは完全なコードを共有しているようには見えませんが、[urllib]（https://docs.python.org/3/library/urllib.request.html）でリクエストを交換したいと思います＃module-urllib.request） – etemple1

美しいスープを使用して、すべてのノードを検索し、各ノードごとにurllibを使用して画像を取得してみてください。

from bs4 import BeautifulSoup 

#note here using response.text to get raw html 
soup = BeautifulSoup(response.text) 

#get the src of all images 
img_source = [x.src for x in soup.find_all("img")] 

#get the images 
images = [urllib.urlretrieve(x) for x in img_source]

エラー処理を追加してページに合わせて少し変更する必要がありますが、考え方は変わりません。

出典

2017-06-09 21:07:06 Ding

BeautifulSoupとurllibはAnaconda Python3のインストールに同梱されています。 – Ding

次に、タグを抽出するためにreを使用していますか？私は正規表現に精通していないが、申し訳ありません。 – Ding

ありがとうございました！やってみます – humbleCoder

答えて

関連する問題