BigIntを含むRDDをSpark Dataframeに変換する

こんにちは、私はspark 1.6.3で作業しています。私はそこにいくつかのBigIntスカラ型を持つrddを持っています。それをスパークのデータフレームに変換するにはどうすればよいですか？データフレームを作成する前に型をキャストできますか？BigIntを含むRDDをSpark Dataframeに変換する

マイRDD：プリントアウト

Array[(BigInt, String, String, BigInt, BigInt, BigInt, BigInt, List[String])] = Array((14183197,Browse,3393626f-98e3-4973-8d38-6b2fb17454b5_27331247X28X6839X1506087469573,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session)), (14183197,Browse,3393626f-98e3-4973-8d38-6b2fb17454b5_27331247X28X6839X1506087469573,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)))

：

(14183197,Browse,3393626f-98e3-4973-8d38-6b2fb17454b5_27331247X28X6839X1506087469573,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)) 
(14183197,Browse,3393626f-98e3-4973-8d38-6b2fb17454b5_27331247X28X6839X1506087469573,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session))

私は、スキーマ・オブジェクトを作成するために疲れてきました。

val schema = StructType(Array(
    StructField("trackId", LongType, true), 
    StructField("location", StringType, true), 
    StructField("listId", StringType, true), 
    StructField("videoId", LongType, true), 
    StructField("id", LongType, true), 
    StructField("sequence", LongType, true), 
    StructField("time", LongType, true), 
    StructField("type", ArrayType(StringType), true) 
))

私はval df = sqlContext.createDataFrame(rdd, schema)をしようとした場合、私は

error: overloaded method value createDataFrame with alternatives: 
    (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and> 
    (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and> 
    (rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and> 
    (rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and> 
    (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and> 
    (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame 
cannot be applied to (org.apache.spark.rdd.RDD[(BigInt, String, String, BigInt, BigInt, BigInt, BigInt, scala.collection.immutable.List[String])], org.apache.spark.sql.types.StructType)

このエラーを取得したり、私がval df = sc.parallelize(rdd.toSeq).toDFをしようと、私は次のエラーを取得します。

error: value toSeq is not a member of org.apache.spark.rdd.RDD[(BigInt, String, String, BigInt, BigInt, BigInt, BigInt, List[String])]

すべてのヘルプは、スキーマだけRDD[Row]で使用することができます

出典

2017-10-12 ukbaz

認識されます。ここではリフレクションを使用します。

sqlContext.createDataFrame(rdd)

あなたはまたone of the supported typesに変更BigInt（BigDecimal？）またはuse binary encoder for this fieldを持っています。

出典

2017-10-12 15:48:30 user8371915

ご意見ありがとうございます。私は 'java.lang.UnsupportedOperationException：型scala.BigIntのスキーマはサポートされていません。 – ukbaz

BigIntを含むRDDをSpark Dataframeに変換する

答えて

関連する問題