Javaを使用したSpark MLlib分類入力フォーマット

どうすればDTOからSpark ML入力データセットのリストのリストを変換できますか？Javaを使用したSpark MLlib分類入力フォーマット

私はDTOがあります

public class MachineLearningDTO implements Serializable { 
    private double label; 
    private double[] features; 

    public MachineLearningDTO() { 
    } 

    public MachineLearningDTO(double label, double[] features) { 
     this.label = label; 
     this.features = features; 
    } 

    public double getLabel() { 
     return label; 
    } 

    public void setLabel(double label) { 
     this.label = label; 
    } 

    public double[] getFeatures() { 
     return features; 
    } 

    public void setFeatures(double[] features) { 
     this.features = features; 
    } 
}

とコード：コードの実行後、私は

Dataset<MachineLearningDTO> mlInputDataSet = spark.createDataset(mlInputData, Encoders.bean(MachineLearningDTO.class)); 
LogisticRegression logisticRegression = new LogisticRegression(); 
LogisticRegressionModel model = logisticRegression.fit(MLUtils.convertMatrixColumnsToML(mlInputDataSet));

を取得していた：

java.lang.IllegalArgumentExceptionが：要求に失敗しました：カラムを機能は[email protected]タイプのものでなければなりません。 ❖ArrayType（DoubleType、false）。

コードでorg.apache.spark.ml.linalg.VectorUDTに変更した場合：次に

VectorUDT vectorUDT = new VectorUDT(); 
vectorUDT.serialize(Vectors.dense(......));

私は取得しています：

java.lang.UnsupportedOperationExceptionが：推測することはできませんクラスのタイプ org.apache.spark.ml.linalg.VectorUDTはBean準拠ではないため、

org.apache.spark私はまたそれで立ち往生されます場合は、誰かに、考え出した

出典

2017-06-13 Maksym

、：.sql.catalyst.JavaTypeInference $ .ORG $ apacheの$スパーク$のsql $触媒$ JavaTypeInference $$ serializerFor（437 JavaTypeInference.scala）私は簡単なコンバータを書いて動作します：

private Dataset<Row> convertToMlInputFormat(List< MachineLearningDTO> data) { 
    List<Row> rowData = data.stream() 
      .map(dto -> 
        RowFactory.create(dto.getLabel() ? 1.0d : 0.0d, Vectors.dense(dto.getFeatures()))) 
      .collect(Collectors.toList()); 
    StructType schema = new StructType(new StructField[]{ 
      new StructField("label", DataTypes.DoubleType, false, Metadata.empty()), 
      new StructField("features", new VectorUDT(), false, Metadata.empty()), 
    }); 

    return spark.createDataFrame(rowData, schema); 
}

出典

2017-06-13 14:51:34 Maksym

Javaを使用したSpark MLlib分類入力フォーマット

答えて

関連する問題