最初に私はソースを説明させてください：私はウェブサイトのすべてのページを検索し、テキストで特殊なHTMLタグを収集する単純なpythonスクリプトを書いています。私のコード：pythonスクリプトのプロセスを停止しますか？

lineline = urllib.request.urlopen("http://www.test-site.com") 
lineliner = lineline.read() 
allsoupurl = beautifulsoup(lineliner, "html.parser") 
allhtmllisturl = allsoupurl.find_all("h1", class_= "title") 

print (allhtmllisturl)

はOK、このコードは、非常に良い仕事をしたクラスのタイトルで使用可能なすべてのH1タグを表示します。結果は次のとおりです。

[<h1 class="title>title-1</h1>"] 
[<h1 class="title>title-2</h1>"] 
[<h1 class="title>title-3</h1>"] 
[<h1 class="title>title-4</h1>"]

をしかし、私はこのようにコードを変更する場合：スクリプトの

lineline = urllib.request.urlopen("http://www.test-site.com") 
lineliner = lineline.read() 
allsoupurl = beautifulsoup(lineliner, "html.parser") 
allhtmllisturl = allsoupurl.find_all("h1", class_= "title") 

for h1 in allhtmllisturl: 
    print (h1.get_text())

結果は、ちょうど最初の利用可能な（H1）タグと、スクリプトの終了を表示し、利用可能なすべてのタグが表示されません。と結果は：

title-1

何が問題ですか？いくつかのIDを有する

おかげ

出典

2017-04-26 Ys Ys

なければなりませんか？ –

"の後に閉じられます。これは '

タイトル-1

' –

私のローカルセットアップ（pyhon3、beautifulsoup4.5.3）で問題を再現できません。あなたは使用されているpythonとbeautifoulsoupのあなたのバージョンを提供していただけますか？ – Catalin

find_all内部（）要素（allhtmllisturl）を入力し、allhtmllisturl.shapeれるもの内部ATTRS = {}（属性）

lineline = urllib.request.urlopen("http://www.test-site.com") 
lineliner = lineline.read() 
allsoupurl = beautifulsoup(lineliner, "html.parser") 
allhtmllisturl = allsoupurl.find_all("h1", attrs={'class'= "title"}) 

for h1 in allhtmllisturl: 
    print (h1.get_text())

出典

2017-04-26 06:18:57

あなたの問題を解決したら、回答 –

tnxを受け入れてください。しかし解決しない - @ nishant-kumar –

pythonスクリプトのプロセスを停止しますか？

"の後に閉じられます。これは '

タイトル-1

答えて

関連する問題