2017-10-07 3 views
1

私は、json行からなるDataset<String> dsを持っています。Json Stringの配列をSpark 2.2.0の特定の列のデータセットに変換するにはどうすればよいですか?

サンプルJSON行

[ 
    "{"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]}", 
    "{"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math", "year": 2017}]}" 

] 

ds.printSchema()(これは、データセット内の1行分の一例である)

root 
|-- value: string (nullable = true) 

は、今は次のデータセットに変換しますSpark 2.2.0を使用して

name |    address    | docs 
---------------------------------------------------------------------------------- 
"foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 2016}] 
"bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}] 

JavしかしScalaは限りのJava APIここ

で利用可能な機能があるとしても結構です、私がこれまで試したどの

val df = Seq("""["{"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]}", "{"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math", "year": 2017}]}" ]""").toDF 

df.show(偽)

|value                                                      | 
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
|["{"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]}", "{"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math", "year": 2017}]}" ]| 

答えて

1

は、私が見つけたですJavaの回避策です。私はこれが役立つことを願っています

輸入以下で、次のコードを使用し

import java.util.List; 
import java.util.Map; 

public class TempBean 
    { 
     String name; 
     Map<String, String> address; 
     List<Map<String, String>> docs; 
     public String getName() 
      { 
       return name; 
      } 
     public void setName(String name) 
      { 
       this.name = name; 
      } 
     public Map<String, String> getAddress() 
      { 
       return address; 
      } 
     public void setAddress(Map<String, String> address) 
      { 
       this.address = address; 
      } 
     public List<Map<String, String>> getDocs() 
      { 
       return docs; 
      } 
     public void setDocs(List<Map<String, String>> docs) 
      { 
       this.docs = docs; 
      } 

    } 

Beanクラス(私の場合はTempBean)を作成します。

Dataset<Row> dff = spark.createDataFrame(tempList, TempBean.class); 

ショーデータベース

//import com.fasterxml.jackson.core.JsonGenerator; 
//import com.fasterxml.jackson.core.JsonParseException; 
//import com.fasterxml.jackson.core.JsonProcessingException; 
//import com.fasterxml.jackson.core.type.TypeReference; 
//import com.fasterxml.jackson.databind.JsonMappingException; 
//import com.fasterxml.jackson.databind.ObjectMapper; 

ObjectMapper mapper = new ObjectMapper(); 
List<String> dfList = ds.collectAsList(); //using your Dataset<String> 
List<TempBean> tempList = new ArrayList<TempBean>(); 
try 
    { 
     for (String json : dfList) 
      { 
      List<Map<String, Object>> mapList = mapper.readValue(json, new TypeReference<List<Map<String, Object>>>(){}); 
      for(Map<String,Object> map : mapList) 
      { 
       TempBean temp = new TempBean(); 
       temp.setName(map.get("name").toString()); 
      temp.setAddress((Map<String,String>)map.get("address")); 
      temp.setDocs((List<Map<String,String>>)map.get("docs")); 
      tempList.add(temp); 
      } 
      } 
    } 
catch (JsonParseException e) 
    { 
     e.printStackTrace(); 
    } 
catch (JsonMappingException e) 
    { 
     e.printStackTrace(); 
    } 
catch (IOException e) 
    { 
     e.printStackTrace(); 
    } 

は、データフレームを作成します。

dff.show(false); 
+--------------------------------+---------------------------------------+----+ 
|address       |docs         |name| 
+--------------------------------+---------------------------------------+----+ 
|Map(state -> CA, country -> USA)|[Map(subject -> english, year -> 2016)]|foo | 
|Map(state -> OH, country -> USA)|[Map(subject -> math, year -> 2017)] |bar | 
+--------------------------------+---------------------------------------+----+ 

印刷スキーマ:

dff.printSchema(); 
root 
|-- address: map (nullable = true) 
| |-- key: string 
| |-- value: string (valueContainsNull = true) 
|-- docs: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: string (valueContainsNull = true) 
|-- name: string (nullable = true) 
関連する問題