Scrapyでネストされたテキストを抽出する方法は？

私はScrapyを使用して、このウェブサイト上のブランドの説明の段落を抽出しようとしている： http://us.asos.com/hope-and-ivy/hope-ivy-dotty-mesh-midi-dress-with-ruffle-detail/prd/8663409?clr=black&cid=2623&pgesize=36&pge=0&totalstyles=627&gridsize=3&gridrow=1&gridcolumn=1 Scrapyでネストされたテキストを抽出する方法は？

HTML要素は次のようになります。

<div class="brand-description"> 
    <h4>Brand</h4> 
    <span>"Prom queens and wedding guests, claim the best-dressed title in " 
    <a href="/Women/A-To-Z-Of-Brands/Hope-And-Ivy/Cat/pgecategory.aspx?cid=21368"> 
     <strong>"Hope and Ivy's"</strong> 
    </a> 
    "occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses." 
    </span> 
</div>

私の望ましい結果は次のとおりです。

「プロム女王様や結婚式のゲストは、ホープとアイビーの機会に恵まれたコレクションで最高の服装を誇っています。ハンドペイントのフローラル、Bardotネックライン、そして魅力的なペンシルドレスのためのノミネートスタイルのショップを買ってください。

私はこの方法を試してみました：

['Prom queens and wedding guests, claim the best-dressed title in ', ' occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses.']

私の質問：

response.css("div.brand-description span::text").extract()

はしかし、私が得たテキストリストは、「希望とアイビーの」ある「強い」タグ、中にあるものが欠落していますは、 "href"タグに注意を払わずにプレーンテキストを取得できますか？

出典

2017-08-29 lliu05

この// divのの[を取ってみてください@ class = "ブランド説明"]/div –

あなたはまだいくつかの後処理を行う必要があるかもしれませんが、これはあなたが行うことができますおそらく最高です：

response.xpath('normalize-space(//div[@class="brand-description"]/span)').extract_first()

あなたを与えるであろう

u'"Prom queens and wedding guests, claim the best-dressed title in " "Hope and Ivy\'s" "occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses."'

出典

2017-08-29 05:49:22

Scrapyでネストされたテキストを抽出する方法は？

答えて

関連する問題