jarファイルを作成せずにhadoopジョブを実行する

私はhadoopの初心者です。チュートリアルプロジェクトのいくつかを練習しました。当初、私はできマッパーと減速ファイルを別々 hadoop jar /usr/local/hadoop/hadoop-2.8.0/share/hadoop/tools/lib/hadoop-streaming-2.8.0.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input input1 -output joboutputjarファイルを作成せずにhadoopジョブを実行する

を指定することができますのpythonでのHadoopでプロジェクトをやったしかし、私はJavaで同じことをやってみたいが、私は唯一のjarファイルを作成することによって、チュートリアルを見つけました。 javaマッパーとレデューサーコードをデバッグする方法が見つかりませんでした。いくつかのデバッグオプションを使用してコードをテストするアイデアや可能性はありますか？

ここで私は打撃を受けたスクリーンショットを投稿します。

Sample input file in csv

マッパーコード

package SalesCountry; 

import java.io.IOException; 

import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.*; 

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 
    //private final static IntWritable one = new IntWritable(1); 

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 

     String valueString = value.toString(); 
     String[] SingleCountryData = valueString.split(","); 
     output.collect(new Text(SingleCountryData[7]), new IntWritable(Integer.parseInt(SingleCountryData[2]))); 
    } 
}

リデューサーコード

`package SalesCountry; 

import java.io.IOException; 
import java.util.*; 

import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.*; 

public class SalesCountryReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 

    public void reduce(Text t_key, Iterator<IntWritable> values, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { 
     Text key = t_key; 
     int salesForCountry = 0; 
     while (values.hasNext()) { 
      // replace type of value with the actual type of our value 
      IntWritable value = (IntWritable) values.next(); 
      salesForCountry += value.get(); 

     } 
     output.collect(key, new IntWritable(salesForCountry)); 
    } 
} 
`

ターミナル出力

$HADOOP_HOME/bin/hadoop jar TotalSalePerCountry.jar inputMapReduce mapreduce_output_sales 
17/05/18 12:52:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 
17/05/18 12:52:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
17/05/18 12:52:47 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
17/05/18 12:52:47 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 
17/05/18 12:52:47 INFO mapred.FileInputFormat: Total input files to process : 1 
17/05/18 12:52:47 INFO mapreduce.JobSubmitter: number of splits:1 
17/05/18 12:52:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1862814770_0001 
17/05/18 12:52:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
17/05/18 12:52:47 INFO mapreduce.Job: Running job: job_local1862814770_0001 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 
17/05/18 12:52:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
17/05/18 12:52:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: Waiting for map tasks 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: Starting task: attempt_local1862814770_0001_m_000000_0 
17/05/18 12:52:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
17/05/18 12:52:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 
17/05/18 12:52:47 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
17/05/18 12:52:47 INFO mapred.MapTask: Processing split: file:/home/deevita/MapReduceTutorial/inputMapReduce/SalesJan2009.csv:0+123638 
17/05/18 12:52:47 INFO mapred.MapTask: numReduceTasks: 1 
17/05/18 12:52:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
17/05/18 12:52:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
17/05/18 12:52:47 INFO mapred.MapTask: soft limit at 83886080 
17/05/18 12:52:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
17/05/18 12:52:47 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
17/05/18 12:52:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: map task executor complete. 
17/05/18 12:52:47 WARN mapred.LocalJobRunner: job_local1862814770_0001 
java.lang.Exception: java.lang.NumberFormatException: For input string: "Price" 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) 
Caused by: java.lang.NumberFormatException: For input string: "Price" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:580) 
    at java.lang.Integer.parseInt(Integer.java:615) 
    at SalesCountry.SalesMapper.map(SalesMapper.java:17) 
    at SalesCountry.SalesMapper.map(SalesMapper.java:10) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
17/05/18 12:52:48 INFO mapreduce.Job: Job job_local1862814770_0001 running in uber mode : false 
17/05/18 12:52:48 INFO mapreduce.Job: map 0% reduce 0% 
17/05/18 12:52:48 INFO mapreduce.Job: Job job_local1862814770_0001 failed with state FAILED due to: NA 
17/05/18 12:52:48 INFO mapreduce.Job: Counters: 0 
java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873) 
    at SalesCountry.SalesCountryDriver.main(SalesCountryDriver.java:38) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:234) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148) 
[email protected]:~/MapReduceTutorial$ $HADOOP_HOME/bin/hadoop jar TotalSalePerCountry.jar inputMapReduce mapreduce_output_sales 
17/05/18 16:15:12 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 
17/05/18 16:15:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
17/05/18 16:15:12 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/home/deevita/MapReduceTutorial/mapreduce_output_sales already exists 
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) 
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270) 
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:141) 
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) 
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) 
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) 
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) 
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870) 
    at SalesCountry.SalesCountryDriver.main(SalesCountryDriver.java:38) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:234) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

私はそれが毎回だから（NumberFormatExceptionが空白から来るかもしれない瓶

出典

2017-05-18 Prasat Madesh

を構築することなく、単独の必要性を毎回のMapReduceを実行するためにどのような方法があります私はjarファイルを構築する必要があり、これをコンパイルするが、数値形式例外である知っています最初にトリムする）。

jar/deployサイクル全体を実行せずにデバッグできるように、私は自分のジョブに対して単体テストを書くことをお勧めします。

ここでは、mrunitを使用した例を示します。

<dependency> 
<groupId>org.apache.mrunit</groupId> 
<artifactId>mrunit</artifactId> 
<version>1.0.0</version> 
<classifier>hadoop1</classifier> 
<scope>test</scope> 
</dependency>

テスト

public class HadoopTest { 
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; 

@Before 
public void setUp() { 
    SalesMapper mapper = new SalesMapper(); 
    mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); 
    mapDriver.setMapper(mapper); 
} 

@Test 
public void testMapper() throws Exception { 
    mapDriver.withInput(new LongWritable(1), new Text("date,product,1200,Visa,carolina,baslidoni,england,UK")); 
    mapDriver.withOutput(new Text("UK"), new IntWritable(1200)); 
    mapDriver.runTest(); 
} 
}

出典

2017-05-18 15:19:22

jarファイルを作成せずにhadoopジョブを実行する

答えて

関連する問題