Python 3で記事サイトのテキストコンテンツを抽出するにはどうすればよいですか？

は、私は、次の試してみた：Python 3で記事サイトのテキストコンテンツを抽出するにはどうすればよいですか？

import urllib 

link = 'https://automatetheboringstuff.com/chapter7/' 
f = urllib.request.urlopen(link) 
myfile = f.read() 
print(myfile)

しかし、それは単なるテキストコンテンツではなく、ページのソースを返すようです。

出典

2017-04-19 Fashinated

あなたはその –

ためBeautifulSoup'は ''正しいurllib.request.urlopen（リンク）です '必要がありますか？ – bhansa

チャプターテキストのみを取得したい場合は、美しいスープを選んだと思います。あなたのケースでは

：

import requests 
from bs4 import BeautifulSoup 

res = requests.get('https://automatetheboringstuff.com/chapter7/') 
soup = BeautifulSoup(res.text, 'html.parser') 
print(soup.find('div', { "class" : "book" }).text)

出典

2017-04-19 13:32:59 shomel

Python 3で記事サイトのテキストコンテンツを抽出するにはどうすればよいですか？

答えて

関連する問題