2017-02-22 19 views
1

ウェブサイトhttp://www.jobs.chからコンテンツを掻き出そうとしています。結果は、私が仕事の言葉を指定できるスクリプトでなければなりません。ビジネスアナリストとタイトルを持つすべてのジョブを取得します。私は、すべてのページから最初にすべての一致するリンクを収集し、それらを保存し、後で仕事の説明を抽出することによって、多段階アプローチを使用する必要があると仮定します。BeautifulSoupとPythonを使ってJavascript Webページを掻く

この目標を達成する方法はありますか?または、サイトがreact.jsにあるので、セレンも使用する必要がありますか?ここで

私のスクリプトの開始:

from bs4 import BeautifulSoup 
import urllib2 

jobsFile = urllib2.urlopen("http://www.jobs.ch/en/vacancies/?term=business+analyst") 
jobsHtml = jobsFile.read() 
jobsFile.close() 

soup = BeautifulSoup(jobsHtml) 
jobsAll = soup.find_all("a") 
for links in soup.find_all('a'): 
    print (links.get('href')) 

出力コンソールから:

python jobplatform.py 
/Library/Python/2.7/site-packages/bs4/__init__.py:181: UserWarning: 
No parser was explicitly specified, so I'm using the best available 
HTML parser for this system ("lxml"). This usually isn't a problem, 
but if you run this code on another system, or in a different virtual 
environment, it may use a different parser and behave differently. 

The code that caused this warning is on line 8 of the file 
jobplatform.py. To get rid of this warning, change code that looks 
like this: 

BeautifulSoup([your markup]) 

to this: 

BeautifulSoup([your markup], "lxml") 

markup_type=markup_type)) 
None 
/en/ 
/en/login/ 
/en/register/ 
/en/vacancies/ 
/en/companies/ 
http://www.jobs.ch/en/sucheBerater.php 
http://www.jobs.ch/en/tipps 
http://www.jobs.ch/en/ecom/ 
/de/stellenangebote/?term=business 
/fr/offres-emplois/?term=business 
/en/vacancies/?term=business 
/en/vacancies/ 
None 
None 
None 
None 
/en/vacancies/?page=1&term=business&web-results=1 
None 
None 
/en/companies/79912-bayer-business-services-gmbh/ 
/en/vacancies/detail/7376115/?source=vacancy_search 
/en/companies/79912-bayer-business-services-gmbh/ 
/en/companies/48196-kotra-korea-business-center/ 
/en/vacancies/detail/7397077/?source=vacancy_search 
/en/companies/48196-kotra-korea-business-center/ 
/en/companies/66172-diwisa-distillerie-willisau-sa/ 
/en/vacancies/detail/7363589/?source=vacancy_search 
/en/companies/66172-diwisa-distillerie-willisau-sa/ 
/en/companies/2859-paul-scherrer-institut/ 
/en/vacancies/detail/7359642/?source=vacancy_search 
/en/companies/2859-paul-scherrer-institut/ 
/en/companies/49314-pit-offices-gmbh/ 
/en/vacancies/detail/7344672/?source=vacancy_search 
/en/companies/49314-pit-offices-gmbh/ 
/en/companies/27786-zuehlke-engineering-ag/ 
/en/vacancies/detail/7176356/?source=vacancy_search 
/en/companies/27786-zuehlke-engineering-ag/ 
/en/companies/1802-six-payment-services-ag/ 
/en/vacancies/detail/7396870/?source=vacancy_search 
/en/companies/1802-six-payment-services-ag/ 
/en/companies/49420-mettler-toledo-gruppe/ 
/en/vacancies/detail/7384998/?source=vacancy_search 
/en/companies/49420-mettler-toledo-gruppe/ 
/en/companies/16414-partners-group/ 
/en/vacancies/detail/7279253/?source=vacancy_search 
/en/companies/16414-partners-group/ 
/en/companies/4005-johnson-johnson/ 
/en/vacancies/detail/7397184/?source=vacancy_search 
/en/companies/4005-johnson-johnson/ 
/en/companies/44340-amgen/ 
/en/vacancies/detail/7359993/?source=vacancy_search 
/en/companies/44340-amgen/ 
/en/companies/1802-six-payment-services-ag/ 
/en/vacancies/detail/7357631/?source=vacancy_search 
/en/companies/1802-six-payment-services-ag/ 
/en/companies/16649-fritschi-unternehmensberatung-gmbh/ 
/en/vacancies/detail/7369054/?source=vacancy_search 
/en/companies/16649-fritschi-unternehmensberatung-gmbh/ 
/en/companies/19002-hays-schweiz-ag/ 
/en/vacancies/detail/7389632/?source=vacancy_search 
/en/companies/19002-hays-schweiz-ag/ 
/en/companies/5977-canon-schweiz-ag/ 
/en/vacancies/detail/7236919/?source=vacancy_search 
/en/companies/5977-canon-schweiz-ag/ 
/en/companies/40039-vorwerk-international-strecker-co/ 
/en/vacancies/detail/7374142/?source=vacancy_search 
/en/companies/40039-vorwerk-international-strecker-co/ 
/en/companies/2263-zuercher-kantonalbank/ 
/en/vacancies/detail/7299359/?source=vacancy_search 
/en/companies/2263-zuercher-kantonalbank/ 
/en/companies/10673-accenture/ 
/en/vacancies/detail/6664788/?source=vacancy_search 
/en/companies/10673-accenture/ 
/en/companies/38308-addexpert-gmbh/ 
/en/vacancies/detail/7386047/?source=vacancy_search 
/en/companies/38308-addexpert-gmbh/ 
/en/companies/1802-six-swiss-exchange-ag/ 
/en/vacancies/detail/7357633/?source=vacancy_search 
/en/companies/1802-six-swiss-exchange-ag/ 
/en/vacancies/?page=1&term=business 
/en/vacancies/?page=2&term=business 
/en/vacancies/?page=3&term=business 
/en/vacancies/?page=4&term=business 
/en/vacancies/?page=5&term=business 
/en/vacancies/?page=6&term=business 
/en/vacancies/?page=124&term=business 
/en/vacancies/?page=2&term=business 
None 
http://jobcloud.ch/c/en/about/ 
http://jobcloud.ch/c/en/about/ 
http://jobcloud.ch/c/en/about/team/ 
http://jobcloud.ch/c/en/we-are-jobcloud/ 
None 
http://www.jobs.ch/en/newest.php 
http://www.jobs.ch/en/info.php?info=agb 
http://www.jobs.ch/en/info.php?info=pp 
None 
http://jobcloud.ch/c/en/products/international-recruiting/ 
/en/ 
http://www.jobs.ch/en/sitemap.php 
http://jobcloud.ch/c/en/about/contact/ 
http://jobcloud.ch/ 
http://www.facebook.com/jobs.ch 
http://twitter.com/jobs_ch 
http://www.xing.com/company/jobcloudag 
http://www.youtube.com/jobspunktch 
http://plus.google.com/113239437813300663024/ 
http://www.flickr.com/photos/jobsag 
+2

[ウェブこするJavaScriptのページの可能性のある重複はPythonで](http://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) –

答えて

2

@Teemu Risikkoのコメントで述べたように、あなたはdryscrape又はセレンを使用することができます。ここでdryscrapeを使用したソリューションです:

from bs4 import BeautifulSoup 
import dryscrape 

my_url = "http://www.jobs.ch/en/vacancies/?term=business+analyst" 
session = dryscrape.Session() 
session.visit(my_url) 
response = session.body() 
soup = BeautifulSoup(response) 
jobsAll = soup.find_all("a") 
for links in soup.find_all('a'): 
    print (links.get('href')) 

ソリューションはdryscrapeと非常に簡単ですが、パッケージをインストールすることは難しいことができます(= 55 QT <を使用)...

関連する問題