私はcygwin64 2.874に基づいてWindows 2012サーバーにNutch 1.12をインストールしようとしています。 JavaとLinuxのスキルが限られているため、私はhttps://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Seeding_the_crawldb_with_a_list_of_URLsにステップイントロダクションをしました。コマンドNutch on windows:エラーcrawl.Injector
bin/nutch inject crawl/crawldb urls
はwinutils.exeが見つかりませんでしたのでエラーが発生します。ここではHadoopのログは次のとおりです。
2016-07-01 09:22:25,660 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:432)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:478)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: starting at 2016-07-01 09:22:25
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: urlDir: urls
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.
2016-07-01 09:22:26,050 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-01 09:22:26,394 ERROR crawl.Injector - Injector: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149)
at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58)
at org.apache.nutch.crawl.Injector.inject(Injector.java:357)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
私はここにいくつかのヒントを発見し、https://codeload.github.com/srccodes/hadoop-common-2.2.0-bin/zip/masterからwinutils.exeをダウンロードしました。私はサーバ上のフォルダを解凍し、環境変数NUTCH_OPTS = -Dhadoop.home.dir = [winutils_folder]を設定しました。今winutilsエラーは消えたが、Nutchの呼び出しが別のエラーで失敗します。
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: starting at 2016-07-01 13:24:09
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: urlDir: urls
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.
2016-07-01 13:24:09,885 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-01 13:24:10,307 ERROR crawl.Injector - Injector: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149)
at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58)
at org.apache.nutch.crawl.Injector.inject(Injector.java:357)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
(次の行を追加).bashrcにアップデートした後Hadoopのログは警告のみを示しています。
export JAVA_HOME='/cygdrive/c/Program Files/Java/jre1.8.0_92'
export PATH=$PATH:$JAVA_HOME/bin
しかし、Nutchは、まだエラーがスローされます:
Injector: starting at 2016-07-01 17:25:22
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:570)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:173)
at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:160)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:94)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.nutch.crawl.Injector.inject(Injector.java:376)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
私が間違って設定されたり、それが窓/ cygwinのでNutchのを実行することはできませんすることができる何かヒントがありますか?
Javaのcygwin版があるので、一緒に働くjavaとcygwinはほとんど不可能な作業です。 – matzeri
@matzeri、ヒントをありがとう。私は、派生したJAVA_HOME環境変数をcygwin表記に変更し、.bashrcに追加しました。今度はエラーがなくなり、次のような警告が表示されます:WARN conf.Configuration - file:/tmp/hadoop-sesth/mapred/staging/sesth1247324371/.staging/job_local1247324371_0001/job.xml:最終パラメータを上書きしようとする試み:mapreduce.job .end-notification.max.retry.interval;無視する。 –