2016-08-11 13 views
2

私はhttps://wiki.apache.org/nutch/NutchTutorialに続き、Nutch 1.12をインストールして、Solr 5.5.2と統合しようとしました。私はNutchをチュートリアルで述べた手順に従ってインストールしましたが、以下のコマンドを実行してsolrと統合しようとしていました。それは以下の例外を投げている。Nutch 1.12例外java.io.IOException:なしFileSystem for scheme:http

ビン/ Nutchのインデックスhttp://10.209.18.213:8983/solrクロール/ crawldb/-linkdbクロール/ linkdb /クロール/セグメント/ * -filter -normalize

Exception 

2016-08-11 09:18:40,076 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
2016-08-11 09:18:40,383 WARN segment.SegmentChecker - The input path at crawldb is not a segment... skipping 
2016-08-11 09:18:40,397 INFO segment.SegmentChecker - Segment dir is complete: crawl/segments/20160810110110. 
2016-08-11 09:18:40,403 INFO segment.SegmentChecker - Segment dir is complete: crawl/segments/20160810112551. 
2016-08-11 09:18:40,408 INFO segment.SegmentChecker - Segment dir is complete: crawl/segments/20160810112952. 
2016-08-11 09:18:40,409 INFO indexer.IndexingJob - Indexer: starting at 2016-08-11 09:18:40 
2016-08-11 09:18:40,415 INFO indexer.IndexingJob - Indexer: deleting gone documents: false 
2016-08-11 09:18:40,415 INFO indexer.IndexingJob - Indexer: URL filtering: true 
2016-08-11 09:18:40,415 INFO indexer.IndexingJob - Indexer: URL normalizing: true 
2016-08-11 09:18:40,672 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 
2016-08-11 09:18:40,672 INFO indexer.IndexingJob - Active IndexWriters : 
SOLRIndexWriter 
     solr.server.url : URL of the SOLR instance 
     solr.zookeeper.hosts : URL of the Zookeeper quorum 
     solr.commit.size : buffer size when sending to SOLR (default 1000) 
     solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) 
     solr.auth : use authentication (default false) 
     solr.auth.username : username for authentication 
     solr.auth.password : password for authentication 


2016-08-11 09:18:40,677 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: http://10.209.18.213:8983/solr 
2016-08-11 09:18:40,677 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 
2016-08-11 09:18:40,677 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20160810110110 
2016-08-11 09:18:40,683 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20160810112551 
2016-08-11 09:18:40,684 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20160810112952 
2016-08-11 09:18:41,362 ERROR indexer.IndexingJob - Indexer: java.io.IOException: No FileSystem for scheme: http 
     at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385) 
     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392) 
     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) 
     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) 
     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) 
     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) 
     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) 
     at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256) 
     at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
     at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) 
     at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) 
     at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) 
     at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) 
     at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) 
     at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) 
     at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) 
     at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) 
     at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) 
     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) 
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833) 
     at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) 
     at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) 
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
     at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237) 
+0

私は同じ問題を抱えている:indexコマンドは、引数Nutchのコマンドなし

bin/nutch index -Dsolr.server.url=http://.../solr crawldb/ -linkdb linkdb/ segments/* 

は、コマンドラインのヘルプを表示する必要があります。解決策は見つかりましたか? – LucaoA

答えて

-2

tutorialはまだ非推奨solrindexコマンドを言及しています。

bin/nutch index 
Usage: Indexer <crawldb> [-linkdb <linkdb>] [-params k1=v1&k2=v2...] (<segment> ... | -dir <segments>) [-noCommit] [-deleteGone] [-filter] [-normalize] [-addBinaryContent] [-base64] 
Active IndexWriters : 
SOLRIndexWriter 
     solr.server.url : URL of the SOLR instance 
     solr.zookeeper.hosts : URL of the Zookeeper quorum 
     solr.commit.size : buffer size when sending to SOLR (default 1000) 
     solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) 
     solr.auth : use authentication (default false) 
     solr.auth.username : username for authentication 
     solr.auth.password : password for authentication 
関連する問題