残念ながら、ScalaとSparkSqlには別の問題があります。アサーションが失敗しました:定義済みのスキーマは見つかりませんでした。寄木細工のデータファイルはありません
Exception in thread "main" java.lang.AssertionError: assertion failed: No predefined schema found, and no Parquet data files or summary files found under file:/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet.
私はClouderaのVM(バーチャルボックス環境)を使用しています:マシンは、Spark、ハイブ、インパラなどの一部のサービスで、インストール1つのノードとClouderaの環境を備えた単一のクラスタマネージャを提供 問題はこれです、...
私はSparkSqlでScalaをテストしようとしていますが、解決できないというエラーがあります。これは私のコードです:
package org.test.spark
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
object TestSelectAlgorithm {
def main(args: Array[String]) = {
val conf = new SparkConf()
.setAppName("TestSelectAlgorithm")
.setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
import sqlContext._
val parquetFile = sqlContext.read.parquet("/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet")
parquetFile.registerTempTable("products")
val result = sqlContext.sql("select * from default.products")
parquetFile.show()
}
}
エラー:
/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet
しかし、エラーに伝える:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/01 01:31:34 INFO SparkContext: Running Spark version 1.6.0
16/07/01 01:31:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/01 01:31:35 INFO SecurityManager: Changing view acls to: cloudera
16/07/01 01:31:35 INFO SecurityManager: Changing modify acls to: cloudera
16/07/01 01:31:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
16/07/01 01:31:36 INFO Utils: Successfully started service 'sparkDriver' on port 57073.
16/07/01 01:31:37 INFO Slf4jLogger: Slf4jLogger started
16/07/01 01:31:37 INFO Remoting: Starting remoting
16/07/01 01:31:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:36679]
16/07/01 01:31:38 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 36679.
16/07/01 01:31:38 INFO SparkEnv: Registering MapOutputTracker
16/07/01 01:31:38 INFO SparkEnv: Registering BlockManagerMaster
16/07/01 01:31:38 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1ad66510-ad8f-4239-b4bf-1410135c84f5
16/07/01 01:31:38 INFO MemoryStore: MemoryStore started with capacity 1619.3 MB
16/07/01 01:31:38 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/01 01:31:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/01 01:31:38 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040
16/07/01 01:31:39 INFO Executor: Starting executor ID driver on host localhost
16/07/01 01:31:39 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45098.
16/07/01 01:31:39 INFO NettyBlockTransferService: Server created on 45098
16/07/01 01:31:39 INFO BlockManagerMaster: Trying to register BlockManager
16/07/01 01:31:39 INFO BlockManagerMasterEndpoint: Registering block manager localhost:45098 with 1619.3 MB RAM, BlockManagerId(driver, localhost, 45098)
16/07/01 01:31:39 INFO BlockManagerMaster: Registered BlockManager
16/07/01 01:31:40 INFO ParquetRelation: Listing file:/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet on driver
Exception in thread "main" java.lang.AssertionError: assertion failed: No predefined schema found, and no Parquet data files or summary files found under file:/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet.
at scala.Predef$.assert(Predef.scala:179)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$MetadataCache$$readSchema(ParquetRelation.scala:512)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.refresh(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache$lzycompute(ParquetRelation.scala:145)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache(ParquetRelation.scala:143)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.dataSchema(ParquetRelation.scala:202)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:316)
at org.test.spark.TestSelectAlgorithm$.main(TestSelectAlgorithm.scala:20)
at org.test.spark.TestSelectAlgorithm.main(TestSelectAlgorithm.scala)
16/07/01 01:31:40 INFO SparkContext: Invoking stop() from shutdown hook
16/07/01 01:31:40 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
16/07/01 01:31:40 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/01 01:31:40 INFO MemoryStore: MemoryStore cleared
16/07/01 01:31:40 INFO BlockManager: BlockManager stopped
16/07/01 01:31:40 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/01 01:31:40 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/01 01:31:40 INFO SparkContext: Successfully stopped SparkContext
16/07/01 01:31:40 INFO ShutdownHookManager: Shutdown hook called
16/07/01 01:31:40 INFO ShutdownHookManager: Deleting directory /tmp/spark-2e652280-6b19-4bc5-b686-49e1fba5f7e8
はまず、私はパスが寄木細工のファイルに関する正しいことを確信しています私:No predefined schema found
誰でも私を助けることができますか? ウェブ上で、より正確にはstackoverflow.comで私はいくつかの記事を創設しました..しかし、彼らは私を助けることはできません!
あなたの寄木細工ファイルを検証するためにスパークとは別にハイブまたは一部のparquetreadを使用することを確認するには、 – WoodChopper
申し訳ありませんが、私は理解していません:寄木張りの形式でHDFSのファイル... – Alessandro