2016-12-29 2 views
0

私はnutchを初めて使っています。私はnutch 2.3.1をインストールし、mongodbを使用するように設定しました。注入操作は成功しましたが、生成しようとすると例外が生成されます(下記参照)。 NB:このエラーは、60KのURLを含むシードファイルで生成されます。だから私は100のURLで試してみて、すべてうまくいった。nutchが生成されたときのRuntimeException

このエラーの原因は何ですか?ありがとう!!!

2016-12-30 00:01:48,446 INFO crawl.GeneratorJob - GeneratorJob: starting at 2016-12-30 00:01:48 
2016-12-30 00:01:48,447 INFO crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for fetch. 
2016-12-30 00:01:48,447 INFO crawl.GeneratorJob - GeneratorJob: starting 
2016-12-30 00:01:48,448 INFO crawl.GeneratorJob - GeneratorJob: filtering: true 
2016-12-30 00:01:48,448 INFO crawl.GeneratorJob - GeneratorJob: normalizing: true 
2016-12-30 00:01:48,448 INFO crawl.GeneratorJob - GeneratorJob: topN: 100000 
2016-12-30 00:01:48,816 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
2016-12-30 00:01:48,857 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 
2016-12-30 00:01:48,867 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 
2016-12-30 00:01:48,867 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 
2016-12-30 00:01:51,568 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 
2016-12-30 00:01:51,573 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 
2016-12-30 00:01:51,753 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 
2016-12-30 00:01:51,760 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 
2016-12-30 00:01:52,408 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 
2016-12-30 00:01:52,408 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 
2016-12-30 00:01:52,408 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 
2016-12-30 00:01:52,591 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default 
2016-12-30 00:02:03,229 ERROR mapreduce.GoraRecordReader - Error reading Gora records: Read operation to server localhost:27017 failed on database nutch 
2016-12-30 00:02:04,607 WARN mapred.LocalJobRunner - job_local1740651658_0001 
java.lang.Exception: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
Caused by: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch 
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122) 
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533) 
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) 
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch 
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:298) 
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269) 
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:235) 
    at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:145) 
    at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:135) 
    at com.mongodb.DBCursor._hasNext(DBCursor.java:626) 
    at com.mongodb.DBCursor.hasNext(DBCursor.java:657) 
    at org.apache.gora.mongodb.query.MongoDBResult.nextInner(MongoDBResult.java:71) 
    at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:111) 
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:118) 
    ... 12 more 
Caused by: java.io.EOFException 
    at org.bson.io.Bits.readFully(Bits.java:75) 
    at org.bson.io.Bits.readFully(Bits.java:50) 
    at org.bson.io.Bits.readFully(Bits.java:37) 
    at com.mongodb.Response.<init>(Response.java:42) 
    at com.mongodb.DBPort$1.execute(DBPort.java:164) 
    at com.mongodb.DBPort$1.execute(DBPort.java:158) 
    at com.mongodb.DBPort.doOperation(DBPort.java:187) 
    at com.mongodb.DBPort.call(DBPort.java:158) 
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:290) 
    ... 21 more 
2016-12-30 00:02:04,846 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=nutch-maven-1.0-SNAPSHOT.jar, jobid=job_local1740651658_0001 
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120) 
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227) 
    at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256) 
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330) 

答えて

1

問題はmongodbバージョンから発生することがわかりました。 Nutchはmongo-java-driver-2.13.1.jar広告を使用します。mongodb 3.4.1をインストールしました。だから私はmongo 2.6.7をインストールしましたが、今は正常に動作します。私はNutchのドライバを更新し、mongodbの新しいバージョンで動作するかどうかを教えてくれるでしょう。

+0

更新に成功しましたか? – rzo

関連する問題