ちょうど治療を試して、基本的なスパイダーを動かそうとしています。私はこれがちょうど私が行方不明だと知っているが、私は考えることができるすべてを試した。Scrapy HtmlXPathSelector

私が手にエラーがある：

line 11, in JustASpider 
    sites = hxs.select('//title/text()') 
NameError: name 'hxs' is not defined

私のコードは、現時点では非常に基本的なものですが、私はまだ私が間違っているつもりだ場所を見つけるように見えることはできません。助けてくれてありがとう！

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 

class JustASpider(BaseSpider): 
    name = "google.com" 
    start_urls = ["http://www.google.com/search?hl=en&q=search"] 


    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     sites = hxs.select('//title/text()') 
     for site in sites: 
      print site.extract() 


SPIDER = JustASpider()

出典

2012-09-03 Keanan Koppenhaver

あなたはどのようにスパイダーを使いますか？ 'scrap crawl 'google.com''？ – Leo

あなたのコードに何も問題はありません（もうSPIDERを宣言する必要はありません）。 –

@レオこれは私がそれを実行してきた方法です。 –

最後にSPIDER呼び出しを削除し、forループを削除しました。タイトルタグは1つしかなかったので（期待通り）、ループを捨てていたようだ。次のように私が働いているコードは次のとおりです。

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 

class JustASpider(BaseSpider): 
    name = "google.com" 
    start_urls = ["http://www.google.com/search?hl=en&q=search"] 


    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     titles = hxs.select('//title/text()') 
     final = titles.extract()

出典

2012-09-10 16:27:07

コードは機能しますが、「google.com」ではなく「google」や「googleSpider」のようなスパイダーの単純な名前を使用する方が良いです – parik

表示しているコードを必ず実行してください。

プロジェクト内の*.pycファイルを削除してみてください。

出典

2012-09-05 04:47:16 warvariuc

フォルダ内のすべてのpycファイルを削除した後、私はまだ同じエラーが発生しています。依存関係がない場合は、インポートエラーが発生しますか？ –

あなたのコードにpleseチェックインデント。あなたはタブをスペースでミックスしていますか？ – warvariuc

私は同様の問題、NameError: name 'hxs' is not definedがあったが、スペースとタブに関連した問題：IDEは、スペースの代わりにタブを使用して、あなたはそれをチェックアウトする必要があります。

出典

2013-01-23 23:22:51

は、これが私の作品：

保存test.py
としてファイルをたとえば、コマンドscrapy runspider <filename.py>

を使用します。

scrapy runspider test.py

出典

2013-08-19 15:01:00

コードが正しく見えます。

最新のバージョンのScrapy
HtmlXPathSelectorは非推奨です。セレクタを使用：

出典

2014-02-14 05:14:58 dimka665

これは単なるデモですが、オフコースでカスタマイズする必要があります。！

は/ usr/binに/のenv pythonの

scrapy.selectorインポートHtmlXPathSelector

クラスDmozSpider（BaseSpider）からscrapy.spiderインポートBaseSpider から：名= "DMOZ" allowed_domains = [」 dmoz.org "] start_urlsの= [ " http://www.dmoz.org/Computers/Programming/Languages/Python/Books/」、 "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" ]

def parse(self, response): 
    hxs = HtmlXPathSelector(response) 
    sites = hxs.select('//ul/li') 
    for site in sites: 
     title = site.select('a/text()').extract() 
     link = site.select('a/@href').extract() 
     desc = site.select('text()').extract() 
     print title, link, desc

出典

2014-06-21 19:20:27 user3672836

あなたは

from scrapy.selector import Selector

に

from scrapy.selector import HtmlXPathSelector

を変更し、代わりにhxs=Selector(response)を使用する必要があります。

出典

2015-04-26 05:38:32 neal

コードはかなり古いバージョンです。私が代わりに

from scrapy.spider import Spider 
 
from scrapy.selector import Selector 
 

 
class JustASpider(Spider): 
 
    name = "googlespider" 
 
    allowed_domains=["google.com"] 
 
    start_urls = ["http://www.google.com/search?hl=en&q=search"] 
 

 

 
    def parse(self, response): 
 
     sel = Selector(response) 
 
     sites = sel.xpath('//title/text()').extract() 
 
     print sites 
 
     #for site in sites: (I dont know why you want to loop for extracting the text in the title element) 
 
      #print site.extract()

これらのコードを使用することをお勧めします、それは助け、 hereが従うべき良い例であると思います。

出典

2015-09-04 06:28:46

私はBeautifulSoup4.0でScrapyを使用します。私のために、スープは読んで理解しやすいです。これは、HtmlXPathSelectorを使用する必要がない場合のオプションです。お役に立てれば！

import scrapy 
from bs4 import BeautifulSoup 
import Item 

def parse(self, response): 

    soup = BeautifulSoup(response.body,'html.parser') 
    print 'Current url: %s' % response.url 
    item = Item() 
    for link in soup.find_all('a'): 
     if link.get('href') is not None: 
      url = response.urljoin(link.get('href')) 
      item['url'] = url 
      yield scrapy.Request(url,callback=self.parse) 
      yield item

出典

2016-10-11 19:13:57 sarc360

Scrapy HtmlXPathSelector

答えて

は/ usr/binに/のenv pythonの

関連する問題