私はウェブスクレイピングを開始しています。サイトVRBO.comからデータを取り込みたいと思います。問題は、divタグが個別であり、タグを取得するためにそれらを繰り返す方法がわからないことです。残念ながら、私はすべて同じであるdivタグを取得し、それらを反復することを学んだだけです。以下は私がつかむことを試みているサンプルです。個別のdivタグで美味しいスープを使ったウェブスクラブ
<div data-spu="vrbo-460440-1043551" data-listingid="321.460440.1043551" class="simple-hit listing-spu-vrbo-460440-1043551 js-hit favorite-container preview-listing-vrbo-460440-1043551 container" data-original-title="" title="">
<div class="js-hitContent simple-hit__content row">
<div id="trigger-vrbo-460440-1043551" class="hit-thumbnail-container simple-hit__image-block col-lg-4 col-md-4 col-sm-4 col-xs-4">
<div class="hit-thumbnail-wrappers-container primary">
<div class="hit-thumbnail-wrappers">
<div class="hit-thumbnail-wrapper">
<div id="trigger-vrbo-460440-1043551" class="js-listingImage hit-thumbnail" style="background-image: url("https://odis.homeaway.com/odis/listing/080de0b1-f483-43f1-824e-bbb971e283bb.c9.jpg"); touch-action: pan-y; user-select: none; -webkit-user-drag: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0);">
<a href="/460440" class="hit-url listing-url js-hitLink" target="_blank"></a>
<div class="previous js-previousPhoto" style="touch-action: pan-y; user-select: none; -webkit-user-drag: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0);">
<div class="arrow-container arrow-container-prev">
<i class="icon-chevron-left hit-icon-left"></i>
</div>
</div>
<div class="next js-nextPhoto" style="touch-action: pan-y; user-select: none; -webkit-user-drag: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0);">
<div class="arrow-container arrow-container-next">
<i class="icon-chevron-right hit-icon-right"></i>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="hit-thumbnail-wrappers-container secondary">
<div class="hit-thumbnail-wrappers">
<div class="hit-thumbnail-wrapper">
<div id="trigger-vrbo-460440-1043551" class="js-listingImage hit-thumbnail" style="background-image: url(https://odis.homeaway.com/odis/listing/2ddd546c-6b04-4bce-8682-fc258ea08130.c3.jpg)">
<a href="/460440" class="hit-url listing-url js-hitLink" target="_blank"></a>
</div>
</div>
<div class="secondary-hit-thumbnail-separator"></div>
<div class="hit-thumbnail-wrapper">
<div id="trigger-vrbo-460440-1043551" class="js-listingImage hit-thumbnail" style="background-image: url(https://odis.homeaway.com/odis/listing/d5c83016-b9e4-4668-a7ef-235624ec2af6.c3.jpg)">
<a href="/460440" class="hit-url listing-url js-hitLink" target="_blank"></a>
</div>
</div>
</div>
</div>
<div class="cta-elements clearfix">
<span class=" js-preventHitClick favorite favorite--simple"><div id="fav-vrbo-460440-1043551" data-spu="vrbo-460440-1043551" tabindex="0" class="favorite-button js-favoriteButtonView">
<div class="favorite-icon favorite-icon--simple">
<svg width="18" height="15" viewBox="0 0 18 15" xmlns="http://www.w3.org/2000/svg"><title>Favorite</title><path d="M15.13 8.578l-4.949 4.948c-.78.78-2.044.778-2.822 0L2.41 8.578a4.365 4.365 0 1 1 6.36-5.943A4.353 4.353 0 0 1 12.175 1a4.365 4.365 0 0 1 2.953 7.578z" stroke="#5C6368" fill="#FFF" fill-rule="evenodd"></path></svg>
</div>
<ul class="js-favoritesMenu"></ul>
</div>
</span>
</div>
</div>
<div class="simple-hit__info-block col-lg-8 col-md-8 col-sm-8 col-xs-8 simple-hit--has-premier-badge">
<div class="viewed-urgency-row viewed-urgency-row--simple row">
<div class="viewed-urgency-col col-lg-12 col-md-12 col-sm-12">
<div class="viewed-urgency ">
<!--?xml version="1.0" encoding="UTF-8" standalone="no"?-->
<svg width="14px" height="14px" class="svg-eye" viewBox="0 0 50 50" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g class="svg-eye-path" fill="#000000">
<path d="M25,6 C11.2060547,6 0,21.625 0,24.75 C0,27.875 11.2060547,43.5 25,43.5 C38.8183594,43.5 50,27.875 50,24.75 C50,21.625 38.8183594,6 25,6 L25,6 Z M25,40.375 C13.6962891,40.375 4.17480469,27.6552734 3.17382812,24.75 C4.17480469,21.8447266 13.6962891,9.125 25,9.125 C36.3037109,9.125 45.8496094,21.8447266 46.8261719,24.75 C45.8496094,27.6552734 36.3037109,40.375 25,40.375 L25,40.375 Z M25,15.375 C19.8242188,15.375 15.625,19.5742188 15.625,24.75 C15.625,29.9257812 19.8242188,34.125 25,34.125 C30.1757812,34.125 34.375,29.9257812 34.375,24.75 C34.375,19.5742188 30.1757812,15.375 25,15.375 L25,15.375 Z M25,32.5625 C20.703125,32.5625 17.1875,29.0712891 17.1875,24.75 C17.1875,20.4287109 20.703125,16.9375 25,16.9375 C29.3212891,16.9375 32.8125,20.4287109 32.8125,24.75 C32.8125,29.0712891 29.3212891,32.5625 25,32.5625 L25,32.5625 Z" id="eye"></path>
</g>
</g>
</svg>
<span class="viewed-urgency--message">Viewed 8 times in the last 48 hours</span>
</div>
<span class="premier-badge js-premierBadge" data-placement="left" data-toggle="tooltip" data-centered="true" title="" data-original-title="This VRBO partner has demonstrated and is committed to fast response times, the best rates, and a great guest experience.">
<span class="premier-badge__text">Premier Partner</span>
</span>
</div>
</div>
<div class="headline-row row hidden-xs">
<div class="headline-col col-lg-12 col-md-12 col-sm-12">
<h3 class="hit-headline">
<a href="/460440" class="hit-url js-hitLink visited" target="_blank">
Our Cabin Is A Quiet Refuge In This Busy World.
</a>
</h3>
</div>
</div>
<div class="serp-badge-container banner-row row hidden-xs banner-row--large">
<div class="banner-col col-lg-12 col-md-12 col-sm-12">
<div class="banner-item">
<span class="listing-type">Cabin</span>
<span class="listing-id">#460440</span>
</div>
<div class="banner-item">
<div class="min-stay">
3 night min stay
</div>
</div>
</div>
</div>
<div class="accommodations-row row">
<div class="accommodations-col col-lg-12 col-md-12 col-sm-12">
<div class="accommodations-col--large hidden-xs">
<ul class="list-unstyled accommodations gtNSHitV2DesktopTst2 gtSERPSqFootageTst">
<li class="accommodation accommodation--simple bbs-sleeps">
<div class="bd-bth-slps-label">Sleeps <span class="bd-bth-slps-count">6</span></div>
<div class="bd-bth-slps-count">6</div>
</li>
<li class="accommodation accommodation--simple">
<div class="bd-bth-slps-label">bedrooms</div>
<div class="bd-bth-slps-count">2</div>
</li>
<li class="accommodation accommodation--simple bbs-baths">
<div class="bd-bth-slps-label">bathrooms</div>
<div class="bd-bth-slps-count">2</div>
</li>
<li class="accommodation accommodation--simple bbs-half-baths hidden-sm hidden-xs">
<div class="bd-bth-slps-label">HF BA</div>
<div class="bd-bth-slps-count">1</div>
</li>
<li class="accommodation accommodation--simple bbs-area">
<div class="bd-bth-slps-label">sq. ft. </div>
<div class="bd-bth-slps-count">1800</div>
</li>
</ul>
</div>
<div class="accommodations-col--small hidden-sm hidden-md hidden-lg">
<ul class="list-unstyled accommodations gtSERPSqFootageTst">
<li class="accommodation accommodation--simple">
<span class="bd-bth-slps-count">2</span> BR
</li>
<li class="accommodation accommodation--simple bbs-baths">
<span class="bd-bth-slps-count">2</span> BA
</li>
<li class="accommodation accommodation--simple bbs-half-baths">
<span class="bd-bth-slps-count">1</span> HF BA
</li>
<li class="accommodation accommodation--simple bbs-sleeps">
Sleeps <span class="bd-bth-slps-count">6</span>
</li>
<li class="accommodation accommodation--simple bbs-area">
<div class="bd-bth-slps-label">sq. ft. </div>
<div class="bd-bth-slps-count">1800</div>
</li>
</ul>
</div>
</div>
</div>
<div class="serp-badge-container banner-row row hidden-sm hidden-md hidden-lg banner-row--small">
<div class="banner-col col-lg-12 col-md-12 col-sm-12">
<div class="banner-item">
<span class="listing-type">Cabin</span>
<span class="listing-id">#460440</span>
</div>
<div class="banner-item">
<div class="min-stay">
3 night min stay
</div>
</div>
</div>
</div>
<div class="simple-hit__price-rating-row simple-hit__price-rating-row--large row">
<div class="price-rating-col price-rating-col--left col-sm-6 col-md-6 col-lg-6 col-xs-6 price-rating-col--superlative" }}="">
<div class="hit-rating hit-rating--hasRating hit-rating--hasReviewCount">
<div class="stab-hit-superlative-simple">
<div class="superlative">
<span class="comment">Exceptional!</span>
<span class="numeric">5/5</span>
<br>
<span class="superlative-review visible-lg-block">(41 reviews)</span>
</div>
</div>
<div class="price-overlay">
<div class="rate">
<a href="/460440" class="price js-hitLink" target="_blank"><span class="currency">$</span>185</a>
</div>
<div class="period">
avg/night
</div>
</div>
</div>
</div>
<div class="rating-col rating-col--right col-sm-6 col-md-6 col-lg-6 col-xs-6 rating-col--superlative" }}="">
<div class="rating-content">
<div class="rating rating-5 "></div>
<span class="review-count">41</span>
</div>
</div>
</div>
</div>
</div>
</div>
私は、これはデータの唯一のコンテナです知っているが、上記のHTMLクリップから、最初のdivタグは<div data-spu="vrbo-460440.....>
であり、各コンテナに対して一意です。私は各コンテナを繰り返し、タイトル、コストなどを取得したいが、BS4 findAllを使用するための何も見ていない。 <div class="js-hits">
タグがありますが、グループ内のすべてのコンテナをキャプチャしません。私はこれを条件付き検索の結果から追加します。
<div class="item-container ">
<!--product image-->
<a href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814487338" class="item-img">
<img src="https://images10.newegg.com/NeweggImage/ProductImageCompressAll300/14-487-338-V01.jpg?ex=2" title="EVGA GeForce GTX 1080 Ti FTW3 GAMING, 11G-P4-6696-KR, 11GB GDDR5X, iCX Technology - 9 Thermal Sensors & RGB LED G/P/M" alt="EVGA GeForce GTX 1080 Ti FTW3 GAMING, 11G-P4-6696-KR, 11GB GDDR5X, iCX Technology - 9 Thermal Sensors & RGB LED G/P/M"
is-retina="true" width="240" height="180">
</a>
<div class="item-info">
<!--brand info-->
<div class="item-branding">
<a href="https://www.newegg.com/EVGA/BrandStore/ID-1402" class="item-brand">
<img src="//images10.newegg.com/Brandimage_70x28//Brand1402.gif" title="EVGA" alt="EVGA">
</a>
<!--rating info-->
<a title="Rating + 5" href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814487338&SortField=0&SummaryType=0&PageSize=10&SelectedRating=-1&VideoOnlyMark=False&ignorebbr=1&IsFeedbackTab=true#scrollFullInfo" class="item-rating"><i class="rating rating-5"></i><span class="item-rating-num">(167)</span></a>
</div>
<!--description info-->
<a href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814487338" class="item-title" title="View Details"><i class="icon-premier icon-premier-xsm"></i>EVGA GeForce GTX 1080 Ti FTW3 GAMING, 11G-P4-6696-KR, 11GB GDDR5X, iCX Technology - 9 Thermal Sensors & RGB LED G/P/M</a>
<!--promption info-->
<p class="item-promo"><i class="item-promo-icon"></i>Includes Destiny 2 PC game with purchase, limited offer</p>
<!--feature-->
<ul class="item-features">
<li><strong>Core Clock:</strong> 1569 MHz</li>
<li><strong>Max Resolution:</strong> 7680 x 4320</li>
<li><strong>DisplayPort:</strong> 3 x DisplayPort 1.4</li>
<li><strong>DVI:</strong> 1 x Dual-Link DVI-D</li>
<li><strong>Model #: </strong>11G-P4-6696-KR</li>
<li><strong>Item #: </strong>N82E16814487338</li>
<li><strong>Return Policy: </strong><a href="https://kb.newegg.com/Article/Index/12/3?id=1167#54" target="_blank" title="Extended Holiday Replacement-Only Return Policy(New Window)">Extended Holiday Replacement-Only Return Policy</a></li>
</ul>
<div class="item-action">
<!--price-->
<ul class="price has-label-membership ">
<li class="price-was">
</li>
<li class="price-map">
</li>
<li class="price-current">
<span class="price-current-label">
<a class="membership-info membership-popup" name="membership" style="display: inline" data-neg-popid="MembershipPopup" href="javascript:void(0);"><span class="membership-icon"></span><span style="display: none">|</span></a>
</span>$<strong>799</strong><sup>.99</sup> <a href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814487338&buyingoptions=New" class="price-current-num">(17 Offers)</a>
<span class="price-current-range">
<abbr title="to">–</abbr>
</span>
</li>
<li class="price-save ">
<span class="price-save-endtime price-save-endtime-current"><strong>Sale Ends in 3 Days (Thu)</strong></span>
<span class="price-save-endtime price-save-endtime-another" style="display:none;"><strong>Sale Ends in 11/30</strong></span>
</li>
<li class="price-note">
</li>
<li class="price-ship">
Free Shipping
</li>
</ul>
<!--egg point-->
<!--financing-->
<!--button-->
<div class="item-operate ">
<div class="item-button-area">
<button type="button" title="View Details" class="btn btn-mini " onclick="Javascript:Biz.ProductList.Item.add('https://www.newegg.com/Product/Product.aspx?Item=N82E16814487338');">View Details <i class="fa fa-caret-right"></i></button>
</div>
<!--compare-->
<div class="item-compare-box">
<label class="form-checkbox">
<input id="CompareItem_14-487-338" autocomplete="off" neg-itemnumber="14-487-338" type="checkbox" name="CompareItem" value="CompareItem_14-487-338">
<span class="form-checkbox-title">Compare</span>
</label>
</div>
<script type="text/javascript">
Biz.Product.CompareConfig.compareItems.push("14-487-338");
var itemThumbs = new Object();
itemThumbs.itemNumber = "14-487-338";
itemThumbs.imageUrl = "//images10.newegg.com/ProductImageCompressAll35/14-487-338-V01.jpg";
Biz.Product.CompareConfig.Thumbs.push(itemThumbs);
</script>
</div>
</div>
</div>
</div>
上記は私がnewegg.comで学んだものです。これには同じクラスのクラスがありました。このdivは、私が情報を求めていた個々のアイテムごとです。
私は意味がないかもしれないので、私はサイトへのリンクを追加しました。
ありがとう!
収集しようとしているデータは何ですか。 –
あなたが解析しようとしていることを正確に知る必要があります。 – Ali
私は場所、アメニティ、コスト、および可用性に基づいてプロパティに関するデータを収集しようとしています。私はそれをより簡単にすることができますかわからない、申し訳ありません。私はBS4を学んだサンプル(newegg.com)に追加しました。つまり、WebサイトにはBS4を使って反復処理するためのHTMLタグが一貫していました。つまり、「item-container」という用語をつけました。しかし、VRBO.comサイトでは、各バケーションプロパティ(コンテナ)を定義するHTMLタグには固有のHTMLタグがあり、キャプチャ方法に関する知識が不十分です。私の傾いているBS4の一環として、私は情報のキャプチャを続行する方法がわかりません。 – spark706