2016-10-07 9 views
0

タスクが直列化可能ではない問題に直面している。私は他の答えをチェックし、呼び出し元クラスと呼び出しクラスをserializableにした。私のコードはSparkException:タスクがシリアル化されない(クラスがSerializableを実装した後でさえ)

public class MultiClassification implements Serializable { 
    psvm{ 
    .... 
    JavaRDD<Tuple2<String, String>> pairRDD = someRDD.flatMap 
      (new GetLabelFeature(.....)); 
    } 
} 

like-され、GetLabelFeatureはまた、スタックトレースがhere-

 06 Oct 2016 12:51:20,307 WARN SerializationDebugger:92 - Exception in serialization debugger 
java.lang.reflect.InvocationTargetException 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:497) 
    at org.apache.spark.serializer.SerializationDebugger$ObjectStreamClassMethods$.getObjFieldValues$extension(SerializationDebugger.scala:248) 
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visitSerializable(SerializationDebugger.scala:158) 
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visit(SerializationDebugger.scala:107) 
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visitSerializable(SerializationDebugger.scala:166) 
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visit(SerializationDebugger.scala:107) 
    at org.apache.spark.serializer.SerializationDebugger$.find(SerializationDebugger.scala:66) 
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41) 
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) 
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80) 
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) 
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) 
    at org.apache.spark.SparkContext.clean(SparkContext.scala:1636) 
    at org.apache.spark.rdd.RDD.flatMap(RDD.scala:295) 
    at org.apache.spark.api.java.JavaRDDLike$class.flatMap(JavaRDDLike.scala:123) 
    at org.apache.spark.api.java.AbstractJavaRDDLike.flatMap(JavaRDDLike.scala:46) 
    at com.infosys.iip.nlp.spark.MultiClassification.main(MultiClassification.java:92) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:497) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 
    at java.io.ObjectStreamClass$FieldReflector.getObjFieldValues(ObjectStreamClass.java:2050) 
    at java.io.ObjectStreamClass.getObjFieldValues(ObjectStreamClass.java:1252) 
    ... 29 more 
Exception in thread "main" org.apache.spark.SparkException: Task not serializable 
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) 
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) 
    at org.apache.spark.SparkContext.clean(SparkContext.scala:1636) 
    at org.apache.spark.rdd.RDD.flatMap(RDD.scala:295) 
    at org.apache.spark.api.java.JavaRDDLike$class.flatMap(JavaRDDLike.scala:123) 
    at org.apache.spark.api.java.AbstractJavaRDDLike.flatMap(JavaRDDLike.scala:46) 
    at com.infosys.iip.nlp.spark.MultiClassification.main(MultiClassification.java:92) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:497) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.io.NotSerializableException: edu.emory.mathcs.nlp.decode.NLPDecoder 
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) 
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) 
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) 
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) 
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) 
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) 
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) 
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80) 
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) 
    ... 15 more 
+1

単純であるアプローチ番号2を、使用することができる場合、私は、それがNLPDecoder変数を初期化するのにかかるどのくらいの時間がわかりません'java.io.NotSerializableException:edu.emory.mathcs.nlp.decode.NLPDecoder'の一部ではありませんか? – EJP

+0

@EJPごめんなさい、noobだった – insomniac

答えて

1

PMISentimentLexiconBuilder使用NLPDecoder行いませ

public class GetLabelFeature extends PMISentimentLexiconBuilder<String> 
    implements FlatMapFunction< String, Tuple2<String, String>> , Serializable { 
... 
public Iterable<Tuple2<String, String>> call(String row) throws Exception {...} 
} 

like-のですか?または、あなたのクラス、GetLabelFeatureがそれを使用していますか?

NLPDecoderはシリアル化されないため、シリアル化する必要があるオブジェクトのフィールドにすることはできません。

あなたは2つのオプションがあります:

  1. をNLPDecoder持つフィールドする過渡キーワードを追加して、フィールドを使用していますが、関数内NLPDecoderを作成しない
  2. シリアライズした後、再びそれを初期化。

多くの時間は、それが迅速であれば、アプローチ番号1を使用して、あなたは何

関連する問題