私は、SparkはScalaを使ってデータフレームに次のシナリオを実装する必要があります。スパークデータフレームGROUPBY&複雑なケース文の導出
Scenarios-1: If the "KEY" exist one time, take the "TYPE_VAL" as is .
Eg: KEY=66 exist once so take the TYPE_VAL=100
Scenarios-2: If the "KEY" exist more than one time, Check for the same TYPE_VAL, if it is same, then take TYPE_VAL once .
Eg: for KEY=68,so TYPE_VAL=23
Scenarios-3: If the "KEY" exist more than one time, Check for the same TYPE_VAL and subtract the other TYPE_VAL.
Eg: for KEY=67 , TYPE_VAL=10 exists twice,so subtract 2 & 4 from 10, finally TYPE_VAL=4
私は同じキーのためにして、グループを使用してみましたが、すべてのシナリオを導出することができませんでしいる
//Sample Input Values
val values = List(List("66","100") ,
List("67","10") , List("67","10"),List("67","2"),List("67","4")
List("68","23"),List("68","23")).map(x =>(x(0), x(1)))
import spark.implicits._
//created a dataframe
val df1 = values.toDF("KEY","TYPE_VAL")
df1.show(false)
------------------------
KEY |TYPE_VAL |
------------------------
66 |100 |
67 |10 |
67 |10 |
67 |2 |
67 |4 |
68 |23 |
68 |23 |
-------------------------
予想される出力:
df2.show(false)
------------------------
KEY |TYPE_VAL |
------------------------
66 |100 | -------> [single row ,so 100]
67 |4 | -------> [four rows,out of which two are same & rest are diffrent, so (10 - 2 - 4) = 4 ]
68 |23 | -------> [two rows with same values, so 23]
-------------------------