1
私はStandardScalerと一緒に私のカスタムトランスを使用したい:スパーク(Java)のtransformSchema()
VectorizerTransformer vectorizerTransformer = new VectorizerTransformer(field.getName());
pipelineStages.add(vectorizerTransformer);
StandardScaler scaler = new StandardScaler()
.setInputCol(vectorizerTransformer.getOutputColumn())
.setOutputCol(field.getName() + "_norm")
.setWithStd(true)
.setWithMean(true);
pipelineStages.add(scaler);
しかし、私は実行します。
PipelineModel pipelineModel = pipeline.fit(dframe);
私は例外を取得しています:
Exception in thread "main" java.lang.IllegalArgumentException: Field "trans_vector" does not exist.
at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228)
at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.apache.spark.sql.types.StructType.apply(StructType.scala:227)
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
at org.apache.spark.ml.feature.StandardScalerParams$class.validateAndTransformSchema(StandardScaler.scala:68)
at org.apache.spark.ml.feature.StandardScaler.validateAndTransformSchema(StandardScaler.scala:88)
at org.apache.spark.ml.feature.StandardScaler.transformSchema(StandardScaler.scala:124)
at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:180)
at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:180)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:180)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:132)
at org.sparkexample.PipelineExample.main(PipelineExample.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ここで、フィールドの名前はVectorizerTransformerの出力フィールドです。 VectorizerTransformerで
私は、コードを持っている:
@Override
public StructType transformSchema(StructType arg0) {
return arg0;
}
私はこの問題はここにあると信じています。だから私はそこに何かを書く必要があるが、正確に何か?私はちょうどとして単にそのあるのです
trans -> trans_vector