2016-05-25 9 views
3

Webスクレイピングを開始しましたが、私はそのジョブにBeautifulSoup(Python)を使用しています。テスト用のサンプルWebページのプロパティデータを取得したいと思います。コードは次のように始まりました。BeautifulSoupを使用して希望するXPATHの要素を取得できません

import requests 
from bs4 import BeautifulSoup as Soup 

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" 
response = requests.get(page) 
soup = Soup(response.text) 

# now, I would like to get the price for sale price of the apartment 
# the element in the HTML DOM is as following, 
# <span class="" id="yui_3_18_1_1_1464168312477_3548">$12,895,000<span class="value-suffix"></span></span> 
# The XPath of the element, //*[@id="yui_3_18_1_1_1464168312477_3548"] 

# I write the code as following, 
value = soup.select('span#yui_3_18_1_1_1464168312477_3548') 
print value 

私は結果を得られません。私は間違っているの?

+0

私はブラウザでWebページを開いた。ページソースを開き、 "yui_3_18_1_1_1464168312477_3548"を検索したところ、結果はゼロでした。このウェブページがこのIDでスパンしていると確信していますか? –

+0

それはソースではなく、動的に生成されます –

+0

さて、私はウェブスクレイピングにあまり堪能ではありませんが、これは初めてです。だから、私の質問は、私が物件の価格と住所を売りたいと思ったら、どうすればそれらの情報を入手できますか? – Arefe

答えて

3

要求から戻ったソースと同じではないコンソールのソースを参照していますが、span id="yui_3_18_1_1_1464170172533_3087"が動的に生成されるため、seleniumのようなものを使用する必要があります。

残念ながらidは、我々はCSSセレクタを使用してmain-row home-summary-rowクラスに親の内部で最初のスパンを得ることができるので、私たちはどのような一貫性があることは、親のdivである、ことを使用することはできませんまた、各訪問ユニークです:

In [4]: from selenium import webdriver 
In [5]: dr = webdriver.PhantomJS() 

In [6]: dr.get("http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/") 
In [7]: span = dr.find_element_by_css_selector('div.main-row.home-summary-row span') 
In [8]: print(span.text) 
$12,895,000 

ヘッドレスブラウジングにはphantomjsを使用しましたが、必要に応じてFirefoxまたはChromeを使用できます。すべての情報はリンクにあります。

は、実は私たちはそれを用いBS4を行うことができ、再びソースを見て、IDは、我々は価格を得ることができ、IDを忘れた場合、動的にその発生唯一のものです:

In [26]: soup.select_one("div.main-row.home-summary-row span").text 
Out[26]: u'$12,895,000' 

にも良い方法があります

from pprint import pprint as pp 

pp(metas) 

[<meta content="on" http-equiv="x-dns-prefetch-control"/>, 
<meta charset="unicode-escape"/>, 
<meta content="View 31 photos of this $12,895,000, 7 bed, 10.0 bath, 10500 sqft single family home located at 1630 Amalfi Dr, Pacific Palisades, CA 90272 built in 2015. MLS # 16-103696." name="description"/>, 
<meta content="Zillow, Inc." name="author"/>, 
<meta content="Copyright (c) 2006-2014 Zillow, Inc." name="Copyright"/>, 
<meta content="none" name="msapplication-config"/>, 
<meta content="ALL" name="ROBOTS"/>, 
<meta content="NOYDIR" name="ROBOTS"/>, 
<meta content="NOODP" name="ROBOTS"/>, 
<meta content="yes" name="apple-mobile-web-app-capable"/>, 
<meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>, 
<meta content="telephone=no" name="format-detection"/>, 
<meta content="#3366b8" name="msapplication-TileColor"/>, 
<meta content="http://www.zillowstatic.com/static/images/logos/zillow-logo-win8-tile.png" name="msapplication-TileImage"/>, 
<meta content="/8Me6HBNZX/rt2n5/y1Lo3ZIrkcvkTBimqviTDiurR4=" name="verify-v1"/>, 
<meta content="7cb4abe457d82ae8" name="y_key"/>, 
<meta content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no" name="viewport"/>, 
<meta content="Zillow Real Estate, Rentals, and Mortgage" itemprop="name"/>, 
<meta content="The most trafficked website about home sales and rentals, with real estate values for almost every U.S. home. 1,000,000 listings that you won't find on MLS." itemprop="description"/>, 
<meta content="http://www.zillowstatic.com/static/images/social/share_thumbnail.png" itemprop="image"/>, 
<meta content="691f1bfccade71b5-c065751219a379dd-g64cedb67f5ea020a-a" name="google-translate-customization"/>, 
<meta content="202692,878610170,662000799,100001769907023,10716009,769244502,10716649,503322863" property="fb:admins"/>, 
<meta content="172285552816089" property="fb:app_id"/>, 
<meta content="zillow_fb:home" property="og:type"/>, 
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" property="og:zillow_fb:address"/>, 
<meta content="7" property="zillow_fb:beds"/>, 
<meta content="10" property="zillow_fb:baths"/>, 
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="zillow_fb:description"/>, 
<meta content="http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" property="og:url"/>, 
<meta content="Pacific Palisades Home For Sale" property="og:title"/>, 
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" property="og:image"/>, 
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="og:description"/>, 
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video"/>, 
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video:secure_url"/>, 
<meta content="640" property="og:video:width"/>, 
<meta content="video/mp4" property="og:video:type"/>, 
<meta content="360" property="og:video:height"/>, 
<meta content="238648973530.apps.googleusercontent.com" name="google-signin-clientid"/>, 
<meta content="https://www.googleapis.com/auth/plus.login https://www.googleapis.com/auth/plus.profile.emails.read" name="google-signin-scope"/>, 
<meta content="http://zillow.com" name="google-signin-cookiepolicy"/>, 
<meta content="summary_large_image" name="twitter:card"/>, 
<meta content="@Zillow" name="twitter:site"/>, 
<meta content="@Zillow" name="twitter:creator"/>, 
<meta content="1630 Amalfi Dr" name="twitter:title"/>, 
<meta content="Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp;amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp;amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp;amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp;amp; master suite add warmth to the contemporary feel, &amp;amp; detailed wood paneling &amp;amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp;amp; private patio. Lower level feats. Old Hollywood style theater w/130&amp;quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp;amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp;amp; saltwater pool/spa complete this elegant estate." name="twitter:description"/>, 
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" name="twitter:image"/>, 
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" itemprop="name"/>, 
<meta content="USD" itemprop="priceCurrency"/>, 
<meta content="$12,895,000" itemprop="price"/>, 
<meta content="34.060605" itemprop="latitude"/>, 
<meta content="-118.501625" itemprop="longitude"/>] 
:私たちは何をメタ情報リターンを見れば今

import requests 
from bs4 import BeautifulSoup as Soup 

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" 
response = requests.get(page) 
soup = Soup(response.text,"lxml") 
metas = soup.select("meta") 

を:情報の多くを得るためにメタタグを使用するには

私たちは、属性を使用して価格などの情報を引き出すことができます:

In [22]: soup = Soup(response.text,"lxml") 

In [23]: soup.select_one("meta[itemprop=price]")["content"] 
Out[23]: '$12,895,000' 

In [24]: soup.select_one("meta[name=twitter:description]")["content"] 
Out[24]: 'Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130&quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' 
In [27]: soup.select_one("meta[itemprop=latitude]")["content"] 
Out[27]: '34.060605' 
In [28]: soup.select_one("meta[itemprop=longitude]")["content"] 
Out[28]: '-118.501625' 
In [29]: soup.select_one("meta[property=og:zillow_fb:address]")["content"] 
Out[29]: '1630 Amalfi Dr, Pacific Palisades, CA 90272' 
+0

私のためにいくつかのサンプルコードを書くことができますか?私はあなたの答えを受け入れるでしょう。私は不動産の販売価格と住所を取得したいと思います。 – Arefe

+0

今、ちょうどそれをやっているうちに、idは各呼び出しごとに一意であるため、ページをもっと楽しくするため、その周りの方法を見つける必要があります。 –

+0

これは本当に役に立ちます、私は非常にこの答えに感謝します。 – Arefe

関連する問題