治療結果から非破壊空間を削除/除外する

現在、記事の価格についてウェブサイトをスクラップしようとしていますが、問題が発生しました（問題を解決した後、価格が動的に生成された後、）。治療結果から非破壊空間を削除/除外する

私は価格と商品名を問題なく受け取りますが、「価格」の2番目の結果はすべて「\ xa0」です。私は 'normalize-space（）'を使用して削除しようとしましたが、役に立たないです。

マイコード：

import scrapy 
from scrapy import signals 
from scrapy.http import TextResponse 
from scrapy.xlib.pydispatch import dispatcher 
from horni.items import HorniItem 

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.wait import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
import time 
from selenium.webdriver.common.keys import Keys 

class mySpider(scrapy.Spider): 
    name = "placeholder" 
    allowed_domains = ["placeholder.com"] 
    start_urls = ["https://www.placeholder.com"] 

    def __init__(self): 
     self.driver = webdriver.Chrome() 
     dispatcher.connect(self.spider_closed, signals.spider_closed) 

    def spider_closed(self, spider): 
     self.driver.close() 

    def parse(self, response): 
     self.driver.get("https://www.placeholder.com") 
     response = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8') 
     for post in response.xpath('//body'): 
      item = myItem() 
      item['article_name'] = post.xpath('//a[@class="title-link"]/span/text()').extract() 
      item['price'] = post.xpath('//p[@class="display-price"]/span]/text()').extract() 
      yield item

出典

2016-06-24 rongon

興味があれば、http://stackoverflow.com/a/33829869/2572383では、さまざまな空白文字とXPath 'normalize-space（）'やPythonの ' strip（） ' –

'/p [@ class = "display-price"]/span]/text（） 'を適用するHTMLスニペットを表示できますか？ –

\xa0はLatin1の中に改行なしスペースです。

string = string.replace(u'\xa0', u' ')

更新：

あなたは、次のようなコードを適用することができます。このようにそれを交換してください

ここで

for post in response.xpath('//body'): 
    item = myItem() 
    item['article_name'] = post.xpath('//a[@class="title-link"]/span/text()').extract() 
    item['price'] = post.xpath('//p[@class="display-price"]/span]/text()').extract() 
    item['price'] = item['price'].replace(u'\xa0', u' ') 
    if(item['price'].strip()): 
     yield item

あなたは文字を交換して、価格だけならばアイテムを生み出します空ではありません。

出典

2016-06-24 10:01:37 cb0

お返事ありがとうございます。しかし、私はこれを行う方法を理解していない。あなたのコードを 'item ['price']'に適用できますか？あるいは、非破壊スペースをScrapyレスポンスからすべて排除する方法がありますか？ – rongon

あなたのコメントに関する私の答えを更新しました。 – cb0

ご協力いただきありがとうございます。 'if（item ['price']。strip（））：'要素がリストなので、私のために働かなかった。しかし、あなたは正しい方向に私を振った。 x！= u '\ xa0''なら、item [' prices ']のxにxを使いました。 – rongon

治療結果から非破壊空間を削除/除外する

答えて

関連する問題