をおくる。ここで迅速な提案だ、私はそれが役に立てば幸い:
に結果の
case class AgeRange(lowerBound: Int, upperBound: Int) {
def contains(value: Int): Boolean = value >= lowerBound && value < upperBound
}
val rangeList = List(-1, 12, 17, 24, 34, 44, 54, 64, 100, 1000)
val ranges = rangeList.sliding(2).map((list => AgeRange(list(0), list(1)))).toList
val dataset = Seq("-1", "12", "18", "28", "38", "46").toDS
def findRange(value: Int, ageRanges: List[AgeRange]): Option[AgeRange] = ageRanges.find(_.contains(value))
// With UDF
def myUdf(ageRanges: List[AgeRange]) = udf{
i: Int => findRange(i, ageRanges)
}
val result1 = dataset.toDF("age").withColumn("age_range", myUdf(ranges)(col("age").cast("int")))
// With map
val result2 = dataset.map {
i: String => (i, findRange(i.toInt, ranges))
}.toDF("age", "age_range")
:
result1: org.apache.spark.sql.DataFrame = [age: string, age_range: struct<lowerBound: int, upperBound: int>]
result2: org.apache.spark.sql.DataFrame = [age: string, age_range: struct<lowerBound: int, upperBound: int>]
+---+---------+
|age|age_range|
+---+---------+
| -1| [-1,12]|
| 12| [12,17]|
| 18| [17,24]|
| 28| [24,34]|
| 38| [34,44]|
| 46| [44,54]|
+---+---------+
ありがとうダニエル!!! ...それは私のために働いた!!! ... – Bhavesh