2016-05-06 6 views
1

キーごとの平均を計算するアプリケーションを起動するとこのエラーが発生します。私はラムダexpresion(java8)とcombineBykey関数を使用します。 私は3つのレジスタ(key,timefloat)でファイルを読みました。私は、労働者の両方でJava 8を持っており、これが例外をスローするコードであるspark - > java.lang.ClassCastException:java.lang.invoke.SerializedLambdaのインスタンスを割り当てることができません

16/05/06 15:48:23 INFO DAGScheduler: ShuffleMapStage 0 (mapToPair at ProcesarFichero.java:115) failed in 3.774 s 
    16/05/06 15:48:23 INFO DAGScheduler: Job 0 failed: saveAsTextFile at ProcesarFichero.java:153, took 3.950483 s 
    Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 5, mcava-slave0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.fun$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1 
      at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
      at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 
      at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
      at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
      at org.apache.spark.scheduler.Task.run(Task.scala:89) 
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
      at java.lang.Thread.run(Thread.java:745) 

    Driver stacktrace: 
      at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) 
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) 
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) 
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
      at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) 
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) 
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) 
      at scala.Option.foreach(Option.scala:236) 
      at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) 
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) 
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) 
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) 
      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) 
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) 
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) 
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1922) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1213) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1156) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1156) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
      at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1156) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1060) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
      at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952) 
      at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
      at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951) 
      at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1443) 
      at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1422) 
      at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1422) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
      at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1422) 
      at org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:507) 
      at org.apache.spark.api.java.AbstractJavaRDDLike.saveAsTextFile(JavaRDDLike.scala:46) 
      at com.baitic.mcava.spark.ProcesarFichero.main(ProcesarFichero.java:153) 
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
      at java.lang.reflect.Method.invoke(Method.java:498) 
      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
    Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.fun$1 of type org.apache.spark.api.java.function.Function in inst 
    ance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1 
      at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
      at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 
      at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
      at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
      at org.apache.spark.scheduler.Task.run(Task.scala:89) 
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
      at java.lang.Thread.run(Thread.java:745) 

をマスター:

AvgCount initial = new AvgCount(0.0F, 0); 
    JavaPairRDD<String, AvgCount> avgCounts 
      = pairs.combineByKey((Float x) -> new AvgCount(x, 1), (AvgCount a, Float x) -> new AvgCount(a.total_+x,a.num_+1), 
        (AvgCount a, AvgCount b) ->new AvgCount(a.total_+b.total_,a.num_+b.num_)); 
    avgCounts.saveAsTextFile("hdfs://mcava-master:54310/srv/hadoop/data/spark/xxmedidasSensorca"); 
    } 

public static class AvgCount implements Serializable { 
     public AvgCount(Float total, int num) { 
      total_ = total; 
      num_ = num; 
     } 
     public Float total_; 
     public int num_; 
     public float avg() { 
      return total_/(float) num_; 
     } 
    } 

私はすべての依存関係を持つ脂肪jarファイルを配布するためにconf.setjars()関数を使用します。

+1

同じ例外が発生しましたファットjarに 'setJar()'メソッドをSpark設定に提供することで問題を解決してください。どのようにSparkマスタープロパティを設定しましたか?また、この問題に関する非常に良い答え[this](http://stackoverflow.com/a/28367602/1480446)を参照してください。 –

+0

私は通常、コンフィグレーション・コンテキストobjetc(.setjar(PATH))を使用してJavaコードでこの問題を構成します。または、サブミット・スクリプト(SPARKHOME/bin/submit .... --jar PATH)を起動するときに、私の英語のために申し訳ありません – Miren

+0

これは実際にうまくいくはずです。もう一つの落とし穴は、SparkとScalaの間のバージョンの不一致です。どのバージョンを使用していますか? –

答えて

2

私はsparkconfに.setJarsメソッドを使用していました。 jarファイルのパスが正しいことを確認してください。私はjarファイルへのパスが正しくないので、最終的にシステムプロパティからuser.dirを取得するためにデバッグしたときに、パスを修正して解決策を実行できたので修正するのに苦労しました