MapReduceでタブ区切りの入力値の合計を計算する

MapReduceを使用して、ラベルで区切られたタブ区切り入力の合計を求めようとしています。データは次のようになりますMapReduceでタブ区切りの入力値の合計を計算する

1  5.0 4.0 6.0 
2  2.0 1.0 3.0 
1  3.0 4.0 8.0

最初の列はクラスラベルなので、クラスラベルで分類された出力が必要です。このインスタンスの出力は、ここで

label 1: 30.0 
label 2: 6.0

だろう、私が試したが、私は間違った出力を取得していますし、

予想外のクラスラベルが表示されているコードがあります。

public class Total { 

public static class Map extends Mapper<LongWritable, Text, Text, DoubleWritable> { 
    private final static DoubleWritable one = new DoubleWritable(); 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     String line = value.toString(); 
     StringTokenizer tokenizer = new StringTokenizer(line); 
     word.set(tokenizer.nextToken()); 
     while (tokenizer.hasMoreTokens()) { 
      one.set(Double.valueOf(tokenizer.nextToken())); 
      context.write(word, one);           
     } 
    } 
}

public static class Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> { 
    private Text Msg = new Text(); 


    public void reduce(Text key, Iterable<DoubleWritable> values, Context context) 
     throws IOException, InterruptedException { 
     firstMsg.set("label " + key+": Total"); 

     Double sum = 0.0; 

     for (DoubleWritable val : values) { 

      sum += val.get(); 


     } 

     context.write(Msg, new DoubleWritable(sum)); 

    } 
} 
//void method implementation also exists 
}

出典

2016-12-07 Algo

あなたの目的は、あなたが数字を合計することができるように、自分の減速にすべてのキーと同じキーを取得することです。

ので、この

1  5.0 4.0 6.0 
2  2.0 1.0 3.0 
1  3.0 4.0 8.0

を取ると、基本的に作成するには、この

1  [(5 .0 4.0 6.0), (3.0 4.0 8.0)] 
2  [(2.0 1.0 3.0)]

だから、あなたのマップ出力する必要があり、単にキー1と2、それらの後に残りの値と、それぞれ、キーごとに必ずしも多くの値であるとは限りません

これには、Mapper<LongWritable, Text, Text, Text>を使用できます。それはReducer<Text, Text, Text, DoubleWritable>（(Text,Text)ペアで読み）作り、減速中に、

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
    String line = value.toString(); 

    StringTokenizer tokenizer = new StringTokenizer(line); 
    word.set("label " + tokenizer.nextToken()); 

    StringBuilder remainder = new StringBuilder(); 
    while (tokenizer.hasMoreTokens()) { 
     remainder.append(tokenizer.nextToken()).append(",");           
    } 
    String output = remainder.setLength(remainder.getLength() - 1).toString() 
    context.write(word, new Text(output)); 
}

その後（Textへの出力データ型を変更して）、そしてあなたは今、カンマで区切られた文字列の反復可能であるIterable<Text> valuesを持っています、これを倍精度として解析し、累積合計を取ることができます。

還元剤にはfirstMsg.setピースは必要ありません。これはマッパーで行うことができます。

出典

2016-12-07 06:04:56

MapReduceでタブ区切りの入力値の合計を計算する

答えて

関連する問題