ScrapyのItemLoaderを使用して、HTML要素の最初のn文字を解析したいと思います。保持されるテキストの一部を構成するテキストを含む)。Scrapy ItemLoaderを使用して、複数の要素のテキストの最初のn文字を解析します。
は、ここで設定例です:
サンプルHTML:
<div class="about-copy">
<p>Developers trust Stack Overflow to help solve coding problems
and use Stack Overflow Jobs to find job opportunities. We’re
committed to making the internet a better place, and our products
aim to enrich the lives of developers as they grow and mature in
their careers.
</p>
<a href='...'></a>
<p>Founded in 2008, Stack Overflow sees 40 million visitors each month
and is the flagship site of the Stack Exchange network, home to 150+
Q&A sites dedicated to niche topics.
</p>
</div>
パーサコード:
def parse_details(self, response):
...
l = ItemLoader(item=Entry(), response=response)
# this is presumably the portion of the code that is to be modified
l.add_css('f_brief_summary', 'div.about-copy::text')
...
望ましい結果:
Developers trust Stack Overflow to help solve coding problems
and use Stack Overflow Jobs to find job opportunities. We’re
committed to making the internet a better place, and our products
aim to enrich the lives of developers as they grow and mature in
their careers. Founded in 2008, Stack Overflow
ワンステップの方法はありますこれを行うにはItemLoaderを使用するか、または解析を手動で行う必要があります。次に、テキストを 'add_value'メソッドでItemLoadedオブジェクトに追加しますか?