私はSpark 2を使用していますが、unix_timestampまたはSparkのバージョンに関係しない次の結果が表示されます。データを確認してください。
import org.apache.spark.sql.functions.unix_timestamp
val df2 = sc.parallelize(Seq(
(10, "date", "20170312020200"), (10, "date", "20170312050200"))
).toDF("id ", "somthing ", "datee")
df2.show()
val df3=df2.withColumn("datee", unix_timestamp($"datee", "yyyyMMddHHmmss").cast("timestamp"))
df3.show()
+---+---------+--------------+
|id |somthing | datee|
+---+---------+--------------+
| 10| date|20170312020200|
| 10| date|20170312050200|
+---+---------+--------------+
+---+---------+-------------------+
|id |somthing | datee|
+---+---------+-------------------+
| 10| date|2017-03-12 02:02:00|
| 10| date|2017-03-12 05:02:00|
+---+---------+-------------------+
import org.apache.spark.sql.functions.unix_timestamp
df2: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
df3: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
私のローカルシステムはGMT +5.30で、サーバーはEDT上にあります。 – Himanshu