2016-07-19 13 views
2

私はウェブスクレイピングを学びたいと思っています。 このページのすべてのURLを取得する必要があります。http://www.99acres.com/rent-property-in-chennai-ffidXpath on requestレスポンスが空リストを返します

まず、私のコードでgetresults_ajax POST要求を複製する最新のものでエントリをソートする必要があります。 Chromeのコンソールのxpathが有効な結果を返しても、自分のコードに空のリストが表示されます。

複製リクエストが面倒なことがあります。私は動的ページをスクラップするためにPhantomJSでSeleniumを使用しますが、コンテンツをソートして、トリッキーなレスポンスからデータを取得する必要があります。

マイコード:

d = { 
    'src': 'SORTING_date_d', 
    'static_search': 'true',  
    '': 'undefined', 
    'sortby': 'date_d', 
    'lstAcnId': '8930791340597402', 
    'encrypted_input': 'UiB8IFFTIHwgUiB8IzIjICB8IGNoZW5uYWkgIzMjfCAgfCBDUDMyIzIyIyB8IDI1MTU3NTg2IHwgIHwgMzIgfCM1IyAgfCBSICM0MCN8ICA=', 
    'lstAcn': 'SEARCH', 
    'is_ajax': '1' 
} 

h = { 
    'Referrer': 'http://www.99acres.com/rent-property-in-chennai-ffid?orig_property_type=R&search_type=QS&search_location=CP32&pageid=QS&keyword_orig=chennai' 
} 

req = requests.post(url = 'http://www.99acres.com/do/quicksearch/getresults_ajax', data = d, headers = h) 
r = html.fromstring(req.text) 

#print('test 1' + str(req.text)) 

prices = r.xpath('//div[@title = "View property details"]') 

print('test %d' % len(prices)) 
# driver = webdriver.PhantomJS(executable_path = R'C:\Python27\selenium\webdriver\phantomjs-2.1.1-windows\bin\phantomjs.exe') 

for price in prices: 
    print('price is this ' + str(price)) 

答えて

1

テキストを印刷する場合、あなたはそれがJSONレスポンスで見ることができます:

{"html_ysf":" <div class=\"srp-ysfWrap boxSize\">\n\n\n\n  <diV. etc............. 

だから、あなたが望むものを手に入れることがちょうどHTML2を使用して面白いのhtmlを抽出キー

req = requests.post(url='http://www.99acres.com/do/quicksearch/getresults_ajax', data=d, headers=h) 
r = html.fromstring((req.json()["html2"])) 
prices = r.xpath('//div[@title = "View property details"]') 
print('test %d' % len(prices)) 
for price in prices: 

     print('price is this ' + str(price)) 

各価格はdiv要素です実行:

for price in prices: 
     print(html.tostring(price)) 

我々は次のように出力を得る:

b'<div data-propid="Q26021619" data-pgid="QS" class="srpWrap " title="View property details" data-fsl="N">\n\t\t<input id="ajxPDFlg" type="hidden" value="najx">\n  <input id="dataSRPCLKTRK" type="hidden" value="ON">\n  <i class="uiIcon pLatinum"></i>\t\t<div class="wrapttl">\n\t\t\t<div class="_srpttl srpttl fwn wdthFix480 lf">\n    <b class="WebRupee f14 mr5"> &#8377;</b>    <b id="rs_Q26021619">18,000</b>\n    <a data-proppos="\'\'" id="desc_Q26021619" class="b wWrap" target="_blank" title="2 BHK, Residential Apartment for rent in Choolaimedu" href="/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-spid-Q26021619" data-fsl="N">2 BHK, Residential Apartment for rent in Choolaimedu</a>   </div>\n   <i class="uline" data-maplatlngzm="13.06709,80.2195432,11" data-iwdesc=" Residential Apartment for rent in Choolaimedu" data-ttlurl="http://www.99acres.com/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-spid-Q26021619" data-price="18,000," data-area="Super built-up ,1000,Sq.Ft." data-bedrm="2" data-bldname="On Request" title="View Map"><i class="uiIcon imap"></i><i class="ml_5 f13 vmid hverU">Map</i></i>   <div class="clr"></div>\n\t\t</div>\n  \n    \n\t\t<div class="srpDetail">\n\t\t\t<div class="srpImg rel">\n    <img class="imgBoxSrp lazy" alt="2 BHK, Residential Apartment for rent in Choolaimedu" width="208" height="150" data-original="http://static.99acres.com/images/srpimages/noproperty-new.png" src="http://static.99acres.com/images/i0.gif"><div class="imgCap" data-clk-json=\'{"sno":-1,"ids":"0;732;","phType":"PROP","index":0,"text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'><a class="trackVamRos" vamacttype="Locality_Video_Count" vamactsrc="RENT_SRP" data-trkctgry="CLICK_LOCALITY_VIDEO_LINK" data-blid="732" href="#" data-clk-json=\'{"vtag":"LOC","sno":-1,"tab":4,"ids":"0;732;","phType":"PROP","entity":"locimages","subtab":"LVIDEO","text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'>1 Locality Video</a><div class="clr"></div></div>\t\t\t</div>\n\t\t\t<div class="srpDataWrap"><span>Super built-up Area : <b>1000 Sq.Ft. </b></span><div class="clr pdt8"></div><span class="doElip">Society : <bclass>On Request</bclass></span><div class="sep clr mt3imp"></div><span><span>Highlights:&#160; </span> <span>On Rent&#160;</span><span> <span>/&#160;</span> 1 to 5 years old&#160;</span><span> <span>/&#160;</span> Unfurnished&#160;</span><span> <span>/&#160;</span> 2nd Floor (out of 3)&#160;</span></span><div class="sep clr"></div>\t\t\t\t<div class="lf f12 wBr">\n\t\t\t\t\t<b>Description :</b> \n     Near gandhi road\nGood locality, Calm atmosphere\nCall for more details\t\t\t\t</div>\n                 <div class="rel clr">\n      <div class="lf mt13 mr13">Features: </div>\n      <div class="iconDiv fc_icons fcInit" attr="4,5,24,">\n      <i class="i4" value="Reserved Parking">&#160;</i><i class="i5" value="Feng Shui/Vaastu Compliant">&#160;</i><i class="i24" value="Water Storage">&#160;</i>      </div>\n       \n      <div class="LyrIcon clkEvntStp top0imp"></div>\n    </div>\n     \t\t\t</div>\n   <div class="clr p5"></div>\n   <div class="lf f13 hm10 mb5">Dealer : <a data-pid="1122559" class="hverU blkImp srpTplTrck" title="Sri Sakthi Real Estate , Chennai Central" target="_blank" href="/sri-sakthi-real-estate-chennai-central-drid-1122559">Sri Sakthi Real Estate</a>       &#160;&#160;&#160;&#160;Posted : Today      \n     </div> \n    \t\t</div>\n  <div class="clr"></div>\n   <div data-srptrk="ntrck" class="srpAction m10 mt5">\n  \t\t<a data-mxid="" data-apid="1122559" data-mc="N" data-rc="R" data-cl="Dealer" data-pgid="QS" href="javascript:void(0);" class="srpBlue f13 mr10 lf cntClk" title="Send E-mail &amp; SMS"> Contact Dealer <i>FREE</i></a><a data-pgid="QS" data-src="listing rank" data-lst="P" data-sms="RGVhciBBRERfQlVZRVJOQU1FX0hFUkUsIHlvdSBtYXkgY29udGFjdCBCYWJ1IGF0ICs5MS05Nzg5MDc0NzQxIGZvciBJTlIgMTggSyAxMDAwIFNxLiBGdC4gRmxhdCBpbiBDaG9vbGFpbWVkdS4=" data-trksrc="listing rank" data-ttc="" href="javascript:void(0);" class="srpWhite f13 mr10 lf vpn" id="viewphnoQ26021619" title="View Phone Number">View Phone Number</a><div data-src="listing rank" id="prop_Q26021619" class="sl_container blkImp f15 lf mt5 mr10"><span class="sl_star_empty_container" title="Shortlist this property"><i class="lf uiIcon sl_star_empty"></i><span class="lf m5">Shortlist</span></span></div>\t <div class="lf mt5 rptLtng" data-cl="A" data-md="R" data-pid="1122559" data-proptype="1" data-photocount="0" data-rescom="R">\n\t\t<div class="row dwnSrp"> \n\t\t<i class="spdpIcn repot_acu"></i> \n \t\t<a class="f13 b delCh blLink">Report problem with listing</a>\n\t </div>\n\t </div>\n      </div>\n    <div class="abs verifyLbl ViconPosSrp">\n   <div id="tooltipSociety" class="infoTip2 fwn f13 ital r5 hide VlyrPosSrp">\n    Learn about our verification process <a id="verify_process_info" class="blLink uLine" href="javascript:void(0)" style="text-decoration:underline">here</a>.\n     <i class="ver-arrow-down abs" style="left: 80px; bottom: -12px;"></i>\n   </div>\n   <i class="uiIcon verified mt8"></i>\n  </div>\n  \t\t<div class="clr pdt10"></div>\n </div>  \n\n' 
b'<div data-propid="X22163381" data-pgid="QS" class="srpWrap " title="View property details" data-fsl="N">\n\t\t<input id="ajxPDFlg" type="hidden" value="najx">\n  <input id="dataSRPCLKTRK" type="hidden" value="ON">\n  <i class="uiIcon pLatinum"></i>\t\t<div class="wrapttl">\n\t\t\t<div class="_srpttl srpttl fwn wdthFix480 lf">\n    <b class="WebRupee f14 mr5"> &#8377;</b>    <b id="rs_X22163381">22,000</b>\n    <a data-proppos="\'\'" id="desc_X22163381" class="b wWrap" target="_blank" title="2 BHK, Residential Apartment for rent in Choolaimedu" href="/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-r2-spid-X22163381" data-fsl="N">2 BHK, Residential Apartment for rent in Choolaimedu</a>   </div>\n   <i class="uline" data-maplatlngzm="13.0673818,80.2213615,11" data-iwdesc=" Residential Apartment for rent in Choolaimedu" data-ttlurl="http://www.99acres.com/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-r2-spid-X22163381" data-price="22,000, @ &lt;span class=WebRupee&gt;&#8377; &lt;/span&gt;22/ Sq.Ft." data-area="Built-up ,1000,Sq.Ft." data-bedrm="2" data-bldname="On Request" title="View Map"><i class="uiIcon imap"></i><i class="ml_5 f13 vmid hverU">Map</i></i>   <div class="clr"></div>\n\t\t</div>\n  \n    \n\t\t<div class="srpDetail">\n\t\t\t<div class="srpImg rel">\n    <img class="imgBoxSrp lazy" alt="2 BHK, Residential Apartment for rent in Choolaimedu" width="208" height="150" data-original="http://static.99acres.com/images/srpimages/noproperty-new.png" src="http://static.99acres.com/images/i0.gif"><div class="imgCap" data-clk-json=\'{"sno":-1,"ids":"0;732;","phType":"PROP","index":0,"text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'><a class="trackVamRos" vamacttype="Locality_Video_Count" vamactsrc="RENT_SRP" data-trkctgry="CLICK_LOCALITY_VIDEO_LINK" data-blid="732" href="#" data-clk-json=\'{"vtag":"LOC","sno":-1,"tab":4,"ids":"0;732;","phType":"PROP","entity":"locimages","subtab":"LVIDEO","text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'>1 Locality Video</a><div class="clr"></div></div>\t\t\t</div>\n\t\t\t<div class="srpDataWrap"><span>Built-up Area : <b>1000 Sq.Ft. </b></span><div class="clr pdt8"></div><span class="doElip">Society : <bclass>On Request</bclass></span><div class="sep clr mt3imp"></div><span><span>Highlights:&#160; </span> <span>On Rent&#160;</span><span> <span>/&#160;</span> 1 to 5 years old&#160;</span><span> <span>/&#160;</span> Furnished&#160;</span><span> <span>/&#160;</span> 1st Floor (out of 4)&#160;</span></span><div class="sep clr"></div>\t\t\t\t<div class="lf f12 wBr">\n\t\t\t\t\t<b>Description :</b> \n     2bhk house on rent in choolaimedu , Gill nagar area with all nessesary facilties.\t\t\t\t</div>\n             \t\t\t</div>\n   <div class="clr p5"></div>\n   <div class="lf f13 hm10 mb5">Dealer : <a data-pid="1122559" class="hverU blkImp srpTplTrck" title="Sri Sakthi Real Estate , Chennai Central" target="_blank" href="/sri-sakthi-real-estate-chennai-central-drid-1122559">Sri Sakthi Real Estate</a>       &#160;&#160;&#160;&#160;Posted : Today      \n     </div> \n    \t\t</div>\n  <div class="clr"></div>\n   <div data-srptrk="ntrck" class="srpAction m10 mt5">\n  \t\t<a data-mxid="" data-apid="1122559" data-mc="N" data-rc="R" data-cl="Dealer" data-pgid="QS" href="javascript:void(0);" class="srpBlue f13 mr10 lf cntClk" title="Send E-mail &amp; SMS"> Contact Dealer <i>FREE</i></a><a data-pgid="QS" data-src="listing rank" data-lst="P" data-sms="RGVhciBBRERfQlVZRVJOQU1FX0hFUkUsIHlvdSBtYXkgY29udGFjdCBCYWJ1IGF0ICs5MS05Nzg5MDc0NzQxIGZvciBJTlIgMjIgSyAxMDAwIFNxLiBGdC4gRmxhdCBpbiBDaG9vbGFpbWVkdS4=" data-trksrc="listing rank" data-ttc="" href="javascript:void(0);" class="srpWhite f13 mr10 lf vpn" id="viewphnoX22163381" title="View Phone Number">View Phone Number</a><div data-src="listing rank" id="prop_X22163381" class="sl_container blkImp f15 lf mt5 mr10"><span class="sl_star_empty_container" title="Shortlist this property"><i class="lf uiIcon sl_star_empty"></i><span class="lf m5">Shortlist</span></span></div>\t <div class="lf mt5 rptLtng" data-cl="A" data-md="R" data-pid="1122559" data-proptype="1" data-photocount="0" data-rescom="R">\n\t\t<div class="row dwnSrp"> \n\t\t<i class="spdpIcn repot_acu"></i> \n \t\t<a class="f13 b delCh blLink">Report problem with listing</a>\n\t </div>\n\t </div>\n      </div>\n    <div class="abs verifyLbl ViconPosSrp">\n   <div id="tooltipSociety" class="infoTip2 fwn f13 ital r5 hide VlyrPosSrp">\n    Learn about our verification process <a id="verify_process_info" class="blLink uLine" href="javascript:void(0)" style="text-decoration:underline">here</a>.\n     <i class="ver-arrow-down abs" style="left: 80px; bottom: -12px;"></i>\n   </div>\n   <i class="uiIcon verified mt8"></i>\n  </div>\n  \t\t<div class="clr pdt10"></div>\n </div>  \n\n' 

だから、何でもニーズが要素から抽出することにしたいです。

関連する問題