1
私は、json行からなるDataset<String> ds
を持っています。Json Stringの配列をSpark 2.2.0の特定の列のデータセットに変換するにはどうすればよいですか?
サンプルJSON行
[
"{"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]}",
"{"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math", "year": 2017}]}"
]
ds.printSchema()(これは、データセット内の1行分の一例である)
root
|-- value: string (nullable = true)
は、今は次のデータセットに変換しますSpark 2.2.0を使用して
name | address | docs
----------------------------------------------------------------------------------
"foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 2016}]
"bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}]
JavしかしScalaは限りのJava APIここ
で利用可能な機能があるとしても結構です、私がこれまで試したどの
val df = Seq("""["{"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]}", "{"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math", "year": 2017}]}" ]""").toDF
df.show(偽)
|value |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|["{"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]}", "{"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math", "year": 2017}]}" ]|