スパークは、複数の列を持つ1つのrowKeyに対してHFileを作成します。

JavaRDD<String> hbaseFile = jsc.textFile(HDFS_MASTER+HBASE_FILE); 
JavaPairRDD<ImmutableBytesWritable, KeyValue> putJavaRDD = hbaseFile.mapToPair(line -> convertToKVCol1(line, COLUMN_AGE)); 
putJavaRDD.sortByKey(true); 
putJavaRDD.saveAsNewAPIHadoopFile(stagingFolder, ImmutableBytesWritable.class, KeyValue.class, HFileOutputFormat2.class, conf); 

private static Tuple2<ImmutableBytesWritable, KeyValue> convertToKVCol1(String beanString, byte[] column) { 
    InspurUserEntity inspurUserEntity = gson.fromJson(beanString, InspurUserEntity.class); 
    String rowKey = inspurUserEntity.getDepartment_level1()+"_"+inspurUserEntity.getDepartment_level2()+"_"+inspurUserEntity.getId(); 
    return new Tuple2<>(new ImmutableBytesWritable(Bytes.toBytes(rowKey)), 
      new KeyValue(Bytes.toBytes(rowKey), COLUMN_FAMILY, column, Bytes.toBytes(inspurUserEntity.getAge()))); 
}

上記は私のコードです.1つの列に対してのみ動作し、1つのrowKeyに対して複数の列を持つHFileを作成することができますか？スパークは、複数の列を持つ1つのrowKeyに対してHFileを作成します。

出典

2017-09-22 徐琮杰

宣言でImmutableBytesWritableの代わりに配列を使用する必要があります。

出典

2017-09-22 07:00:07

助けてくれてありがとう。私はmapreduceとsparkのnewbeeです。 ImmutableBytesWritableの代わりに配列を使用する方法の例がありますか？ありがとうalot –

これは私のコードです：新しいTuple2を返します（新しいImmutableBytesWritable（rowKyeBytes）、新しいKeyValue（xxxx））;どのように配列を使用するには？ –

スパークは、複数の列を持つ1つのrowKeyに対してHFileを作成します。

答えて

関連する問題