私はこれをやってしまった:
val baseData = spark.read
.option("header", true)
.option("dateFormat", "dd/MM/yy")
.schema(schema)
.csv("data/football-data.co.uk/1516/E[0,C].csv")
baseData
.select("Date", "Div", "Season", "HomeTeam", "AwayTeam")
.withColumn("round", row_number() over (partitionBy("Div", "Season", "HomeTeam") orderBy("Date")))
.show(100)
をここで私は今、私のマッチ・オブジェクトを作成することができますから。
スキーマ定義は次のようになります。
val schema =
StructType(
Array(
StructField("Season", StringType, false),
StructField("Div", StringType, false),
StructField("Date", DateType, false),
StructField("HomeTeam", StringType, false),
StructField("AwayTeam", StringType, false),
StructField("FTHG", IntegerType, false),
StructField("FTAG", IntegerType, false),
StructField("FTR", StringType, false),
StructField("HTHG", IntegerType, false),
StructField("HTAG", IntegerType, false),
StructField("HTR", StringType, false),
StructField("Referee", StringType, false),
StructField("HS", IntegerType, false),
StructField("AS", IntegerType, false),
StructField("HST", IntegerType, false),
StructField("AST", IntegerType, false),
StructField("HF", IntegerType, false),
StructField("AF", IntegerType, false),
StructField("HC", IntegerType, false),
StructField("AC", IntegerType, false),
StructField("HY", IntegerType, false),
StructField("AY", IntegerType, false),
StructField("HR", IntegerType, false),
StructField("AR", IntegerType, false),
StructField("B365H", DoubleType, false),
StructField("B365D", DoubleType, false),
StructField("B365A", DoubleType, false),
StructField("BWH", DoubleType, false),
StructField("BWD", DoubleType, false),
StructField("BWA", DoubleType, false),
StructField("IWH", DoubleType, false),
StructField("IWD", DoubleType, false),
StructField("IWA", DoubleType, false),
StructField("LBH", DoubleType, false),
StructField("LBD", DoubleType, false),
StructField("LBA", DoubleType, false),
StructField("PSH", DoubleType, false),
StructField("PSD", DoubleType, false),
StructField("PSA", DoubleType, false),
StructField("WHH", DoubleType, false),
StructField("WHD", DoubleType, false),
StructField("WHA", DoubleType, false),
StructField("VCH", DoubleType, false),
StructField("VCD", DoubleType, false),
StructField("VCA", DoubleType, false),
StructField("Bb1X2", DoubleType, false),
StructField("BbMxH", DoubleType, false),
StructField("BbAvH", DoubleType, false),
StructField("BbMxD", DoubleType, false),
StructField("BbAvD", DoubleType, false),
StructField("BbMxA", DoubleType, false),
StructField("BbAvA", DoubleType, false),
StructField("BbOU", DoubleType, false),
StructField("BbMx>2.5", DoubleType, false),
StructField("BbAv>2.5", DoubleType, false),
StructField("BbMx<2.5", DoubleType, false),
StructField("BbAv<2.5", DoubleType, false),
StructField("BbAH", DoubleType, false),
StructField("BbAHh", DoubleType, false),
StructField("BbMxAHH", DoubleType, false),
StructField("BbAvAHH", DoubleType, false),
StructField("BbMxAHA", DoubleType, false),
StructField("BbAvAHA", DoubleType, false),
StructField("PSCH", DoubleType, false),
StructField("PSCD", DoubleType, false),
StructField("PSCA", DoubleType, false)
)
)
(。。私のユースケースのためにこれはないでは非常に有用な、私はこのデータセットを使用して正確にラウンド計算しかし、まだ偉大な練習ができないため)
感謝あなた、これは動作します! – mackan2