0
私はScrapyを使用していて、いくつかの正規表現に一致するURLに従うだけの例に従っています。Scrapy CrawlSpider - 特定のリンクをたどることができないか、カスタムハンドラーで解析する
私はPython開発者ではありませんが、私はこれを試してみるために多くのテクニックを試しました。
私はScrapyドキュメントのサンプルURLを使用しており、CrawlSpider
とimplicationinggの規則をLinkExtractor
から拡張しています。
現在、「friend」という単語が含まれている任意のURLに対してカスタムパーサーを使用したいと考えています。
** Scrapyパイソンスパイダー**
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class MySpider(CrawlSpider):
name = 'example'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com']
rules = [
Rule(LinkExtractor(allow='(friends)'), callback='parse_custom')
]
def parse(self, response):
self.logger.info('1111111111111 - Parsing General URL! %s', response.url)
for href in response.css('a::attr(href)'):
yield response.follow(href, callback=self.parse)
def parse_custom(self, response):
# I have never been able to get this to call
self.logger.info('2222222222222 - Parsing CUSTOM URL! %s', response.url)
for href in response.css('a::attr(href)'):
yield response.follow(href, callback=self.parse)
ログインdocumentationから
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/miracles/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/miracle/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/live/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/life/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/inspirational/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/choices/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/abilities/page/1/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/simile/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/miracles/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/miracle/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/live/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/life/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/inspirational/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/choices/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/abilities/page/1/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/simile/
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/truth/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Marilyn-Monroe/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/friends/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/friendship/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/reading/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/books/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/humor/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Jane-Austen/> (referer: http://quotes.toscrape.com)
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/truth/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/author/Marilyn-Monroe/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/friends/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/friendship/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/reading/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/books/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/tag/humor/
2017-07-30 10:45:59 [example] INFO: 1111111111111 - Parsing General URL! http://quotes.toscrape.com/author/Jane-Austen/
ありがとうございました。私は、ドキュメントを読んでいるうちに、読んでいる部分についても部分的に覚えています。 –