PythonとBeautifulsoup

それは（約17秒は端末に印刷する）全体のHTMLをロードするためにスープのために、これまでにかかるしかしこんにちは、私はこのwebsite PythonとBeautifulsoup

からHTMLを解析しようとしている、私はこれがあるだけで実現しますウェブサイト自体の（他のディレクトリが即座にロードするように見えるよう）ので、しかし、ここで念のために私のコードです：

import urllib2 
from bs4 import BeautifulSoup 

url1 = 'http://www.ukpets.co.uk/ukp/?sf=1716769780&rtn=temp87_224_76_126_at_1456&display_profile=&section=Commercial&sub=Search_&rws=&method=search&tb=comdir1_8&class=comdir1_8&search_form=on&rf=coname&st=Food' 
soup = BeautifulSoup(urllib2.urlopen(url1), 'lxml') 
print soup

だから私の質問は、この仕事が速いか私ができる成し遂げることができ、他のパーサがありますbsと一緒に何かを使用する

PSまたセレンを試しました

出典

2016-11-02 Nikita Maximov

だけでアドバイスしてみ要求、http://docs.python-requests.org/en/master/ – MooingRawr

：2つの操作を分離：ファイルにHTMLをロードし、それを解析します。次に、timeitを使用して、両方の操作にかかる時間を制御します。最初にすべての時間がかかる場合、2番目を最適化しようとする試みはありません。 –

問題がページをロードしている（HTTP要求自体を実行している）場合、パーサーは無関係です。 –

あなたの問題は何もわかりませんが、私の古いコンピュータの目の瞬きでこの一連の文が実行されました。あなたはこれをやってみることができます。

>>> from bs4 import BeautifulSoup 
>>> from urllib.request import urlopen 
>>> URL = 'http://www.ukpets.co.uk/ukp/?sf=1716769780&rtn=temp87_224_76_126_at_1456&display_profile=&section=Commercial&sub=Search_&rws=&method=search&tb=comdir1_8&class=comdir1_8&search_form=on&rf=coname&st=Food' 
>>> HTML = urlopen (URL) 
>>> soup = BeautifulSoup (HTML) 
C:\Python34\lib\site-packages\bs4\__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. 

To get rid of this warning, change this: 

BeautifulSoup([your markup]) 

to this: 

BeautifulSoup([your markup], "lxml") 

    markup_type=markup_type))

出典

2016-11-02 16:57:17

答えて

関連する問題