ピボットスパークマルチレベルのデータセット

私はこれらのスキーマでスパークでDatasetを持っている：ピボットスパークマルチレベルのデータセット

root 
|-- from: struct (nullable = false) 
| |-- id: string (nullable = true) 
| |-- name: string (nullable = true) 
| |-- tags: string (nullable = true) 
|-- v1: struct (nullable = false) 
| |-- id: string (nullable = true) 
| |-- name: string (nullable = true) 
| |-- tags: string (nullable = true) 
|-- v2: struct (nullable = false) 
| |-- id: string (nullable = true) 
| |-- name: string (nullable = true) 
| |-- tags: string (nullable = true) 
|-- v3: struct (nullable = false) 
| |-- id: string (nullable = true) 
| |-- name: string (nullable = true) 
| |-- tags: string (nullable = true) 
|-- to: struct (nullable = false) 
| |-- id: string (nullable = true) 
| |-- name: string (nullable = true) 
| |-- tags: string (nullable = true)

どのようにスカラ座でこのデータセットから（のみ3列のID、名前、タグで）テーブルを作るには？

出典

2017-06-08 Ruslan Dautov

ただarray、explodeを選択し、すべてのネストされたフィールドにすべての列を組み合わせる：

import org.apache.spark.sql.functions.{array, col, explode} 

case class Vertex(id: String, name: String, tags: String) 

val df = Seq(((
    Vertex("1", "from", "a"), Vertex("2", "V1", "b"), Vertex("3", "V2", "c"), 
    Vertex("4", "v3", "d"), Vertex("5", "to", "e") 
)).toDF("from", "v1", "v2", "v3", "to") 


df.select(explode(array(df.columns map col: _*)).alias("col")).select("col.*")

結果と次のように：それは働いて

+---+----+----+ 
| id|name|tags| 
+---+----+----+ 
| 1|from| a| 
| 2| V1| b| 
| 3| V2| c| 
| 4| v3| d| 
| 5| to| e| 
+---+----+----+

出典

2017-06-08 11:49:33 user6910411

、ありがとうございました！ –

ピボットスパークマルチレベルのデータセット

答えて

関連する問題