2017-10-19 15 views
0

私はカフカのメッセージを収集し、gobblinでHDFSに保管したい、 私はgobblin-mapreduce.shを実行すると、スクリプトが例外をスロー:Gobblinカフカ - ***ジャーにFileNotFoundException

2017-10-19 11:49:18 CST ERROR [main] gobblin.runtime.AbstractJobLauncher 442 - Failed to launch and run job job_GobblinKafkaQuickStart_1508384954897: java.io.FileNotFoundException: File doe s not exist: hdfs://localhost:9000/Users/fanjun/plugin/gobblin-dist/lib/gobblin-api-0.9.0-642-g13a21ad.jar 
 
111 java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/Users/fanjun/plugin/gobblin-dist/lib/gobblin-api-0.9.0-642-g13a21ad.jar 
 
112  at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1116) 
 
113  at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1108) 
 
114  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
 
115  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1108) 
 
116  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) 
 
117  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) 
 
118  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99) 
 
119  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) 
 
120  at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265) 
 
121  at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301) 
 
122  at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389) 
 
123  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) 
 
124  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) 
 
125  at java.security.AccessController.doPrivileged(Native Method) 
 
126  at javax.security.auth.Subject.doAs(Subject.java:422) 
 
127  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
 
128  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) 
 
129  at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:230) 
 
130  at gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:570) 
 
131  at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:417) 
 
132  at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89) 
 
133  at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66) 
 
134  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
 
135  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
 
136  at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111) 
 
137  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 
138  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
 
139  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
 
140  at java.lang.reflect.Method.invoke(Method.java:498) 
 
141  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

パス "/Users/fanjun/plugin/gobblin-dist/lib/gobblin-api-0.9.0-642-g13a21ad.jarは" HDFSではない、私のローカルディスク上にありますので、hdfs uriで見つけることはできません。 なぜこのスクリプトはローカルディスクからではなく、hdfsからgobblin-api.jarをロードしますか?

これが私の仕事の設定ファイルである:

job.name=GobblinKafkaQuickStart 
 
job.group=GobblinKafka 
 
job.description=Gobblin quick start job for Kafka 
 
job.lock.enabled=false 
 

 
kafka.brokers=10.0.35.148:9092 
 

 
source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource 
 
extract.namespace=gobblin.extract.kafka 
 

 
writer.builder.class=gobblin.writer.SimpleDataWriterBuilder 
 
writer.file.path.type=tablename 
 
writer.destination.type=HDFS 
 
writer.output.format=txt 
 

 
data.publisher.type=gobblin.publisher.BaseDataPublisher 
 

 
mr.job.max.mappers=1 
 

 
metrics.reporting.file.enabled=true 
 
metrics.log.dir=/gobblin-kafka/metrics 
 
metrics.reporting.file.suffix=txt 
 

 
bootstrap.with.offset=earliest 
 

 
fs.uri=hdfs://localhost:9000 
 
writer.fs.uri=hdfs://localhost:9000 
 
state.store.fs.uri=hdfs://localhost:9000 
 

 
mr.job.root.dir=/gobblin-kafka/working 
 
state.store.dir=/gobblin-kafka/state-store 
 
task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data 
 
data.publisher.final.dir=/gobblintest/job-output

答えて

0

あなたはカフカの接続(Apacheのカフカの一部)、代わりにHDFS connectorを使用して検討していますか?

+0

アドバイスありがとうございます、私はそれを見ています。私はこの問題を解決しました。それは糸のためです(私のhadoopは擬似配布バージョンです) – user1978965