ストラクチャードストリーミングは、寄木張りファイルをhadoopに書き込みます

構造化ストリーミングの結果を寄木張りファイルに書き込むことができました。問題は、それらのファイルがローカルファイルシステムにあることです。そして今、それらをhadoopファイルシステムに書きたいと思います。それを行う方法はありますか？ストラクチャードストリーミングは、寄木張りファイルをhadoopに書き込みます

StreamingQuery query = result //.orderBy("window") 
      .repartition(1) 
      .writeStream() 
      .outputMode(OutputMode.Append()) 
      .format("parquet") 
      .option("checkpointLocation", "hdfs://localhost:19000/data/checkpoints") 
      .start("hdfs://localhost:19000/data/total");

私はこのコードを使用しますが、それは言う：

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:19000/data/checkpoints/metadata, expected: file:/// 
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649) 
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) 
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606) 
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824) 
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601) 
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) 
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426) 
at org.apache.spark.sql.execution.streaming.StreamMetadata$.read(StreamMetadata.scala:51) 
at org.apache.spark.sql.execution.streaming.StreamExecution.<init>(StreamExecution.scala:100) 
at org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:232) 
at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:269) 
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:262) 
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:206)

おかげ

出典

2017-03-01 taniGroup

これは既知の問題です：https://issues.apache.org/jira/browse/SPARK-19407

は、次のリリースで修正される予定です。回避策として--conf spark.hadoop.fs.defaultFS=hdfs://localhost:19000を使用して、デフォルトのファイルシステムをs3に設定できます。

出典

2017-03-01 05:31:53 zsxwing

それは動作し、私が使用：SparkSession.builder（） .appName（ "データ処理スパーク"）管理組織のビー玉（ "[2]ローカル"）の.config（ "spark.hadoop.fs.defaultFS"、 "hdfs：// localhost：19000"） .getOrCreate（）; – taniGroup

ええ、それは同じです。 – zsxwing

ストラクチャードストリーミングは、寄木張りファイルをhadoopに書き込みます

答えて

関連する問題