scrapy.selector.Selector.__init__()
expects a Response
object as first argument:
This is the error i'm getting from terminal, Spider error processing
は、ここに私のコードです。
あなたはHTTPレスポンスボディのためのセレクタを構築したい場合は、text=
引数を使用します。
$ scrapy shell https://docs.python.org/2/howto/curses.html
2016-11-21 11:05:34 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
(...)
2016-11-21 11:05:35 [scrapy] INFO: Spider opened
2016-11-21 11:05:35 [scrapy] DEBUG: Crawled (200) <GET https://docs.python.org/2/howto/curses.html> (referer: None)
(...)
>>>
>>> #
>>> # passing response.body (bytes) instead of a Response object fails
>>> #
>>> scrapy.Selector(response.body)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/scrapy/selector/unified.py", line 67, in __init__
text = response.text
AttributeError: 'str' object has no attribute 'text'
>>>
>>> #
>>> # use text= argument to pass response body
>>> #
>>> scrapy.Selector(text=response.body)
<Selector xpath=None data=u'<html xmlns="http://www.w3.org/1999/xhtm'>
>>>
>>> scrapy.Selector(text=response.body).xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
>>>
を簡単な方法は、直接応答オブジェクトを渡すことです:
>>> scrapy.Selector(response).xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
とにもあなたの応答がHtmlResponse
またはXmlResponse
(通常はウェブスクレイピングの場合)であれば.xpath()
method on the response
instanceを使用することです(セレクタを作成する便利な方法です)
>>> response.xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
コード貼り付けのインデントを修正できますか? –