2017-03-29 10 views
0

ファイル(csv)を読み込み、そのスキーマを印刷しようとしました。私の問題は、私のファイルはSQLのようなクエリにヘッダがないということです。私はこのコードを試してみました :CSVをRDDに変換し、Spark/Scalaで読み取る

val logFile = "../resouces/cells.csv" 

val dfCells = spark.read 
.format("csv") 
.option("header", "false") 
.option("mode", "DROPMALFORMED") 
.option("delimiter", "|") 
.csv(logFile) 

dfCell.printSchema; 

ファイル入力は次のとおりです。

ES|15032017|25100|54600||3G|FIBRE|OUTDOOR|COMPANY|MAST|MACRO||47001|DU|41.651834|-4.728534|||||||||||||||| 
ES|15032017|25101|54601||3G|FIBRE|OUTDOOR|COMPANY|ROOFTOP|MACRO||47001|DU|41.651994|-4.724693|||||||||||||||| 
ES|15032017|25102|54602||4G|FIBRE|OUTDOOR|COMPANY|ROOFTOP|MICRO||47001|U|41.650912|-4.720648|||||||||||||||| 
ES|15032017|25103|54603||3G|MICROWAVES|OUTDOOR|COMPANY|ROOFTOP|MACRO||47001|U|41.647312|-4.717118|||||||||||||||| 

出力は、次のとおりです。あなたがタイプミスを持っているよう

| 
| 
| 

答えて

1

が見えます。 dfCells.printSchemaを使用してください。

0

私は、csvの代わりにload関数でスパーク1.5.0を使用します。

val logFile = "../input.csv" 

val dfCells = sqlContext.read 
         .format("csv") 
         .option("header", "false") 
         .option("mode", "DROPMALFORMED") 
         .option("delimiter", "|") 
         .load(logFile) 

dfCells.show() 
+---+--------+-----+-----+---+---+----------+-------+-------+-------+-----+---+-----+---+---------+---------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 
| C0|  C1| C2| C3| C4| C5|  C6|  C7|  C8|  C9| C10|C11| C12|C13|  C14|  C15|C16|C17|C18|C19|C20|C21|C22|C23|C24|C25|C26|C27|C28|C29|C30|C31| 
+---+--------+-----+-----+---+---+----------+-------+-------+-------+-----+---+-----+---+---------+---------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 
| ES|15032017|25100|54600| | 3G|  FIBRE|OUTDOOR|COMPANY| MAST|MACRO| |47001| DU|41.651834|-4.728534| | | | | | | | | | | | | | | | | 
| ES|15032017|25101|54601| | 3G|  FIBRE|OUTDOOR|COMPANY|ROOFTOP|MACRO| |47001| DU|41.651994|-4.724693| | | | | | | | | | | | | | | | | 
| ES|15032017|25102|54602| | 4G|  FIBRE|OUTDOOR|COMPANY|ROOFTOP|MICRO| |47001| U|41.650912|-4.720648| | | | | | | | | | | | | | | | | 
| ES|15032017|25103|54603| | 3G|MICROWAVES|OUTDOOR|COMPANY|ROOFTOP|MACRO| |47001| U|41.647312|-4.717118| | | | | | | | | | | | | | | | | 
+---+--------+-----+-----+---+---+----------+-------+-------+-------+-----+---+-----+---+---------+---------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 

とスキーマは次のとおりです。

dfCells.printSchema() 
root 
|-- C0: string (nullable = true) 
|-- C1: string (nullable = true) 
|-- C2: string (nullable = true) 
|-- C3: string (nullable = true) 
|-- C4: string (nullable = true) 
|-- C5: string (nullable = true) 
|-- C6: string (nullable = true) 
|-- C7: string (nullable = true) 
|-- C8: string (nullable = true) 
|-- C9: string (nullable = true) 
|-- C10: string (nullable = true) 
|-- C11: string (nullable = true) 
|-- C12: string (nullable = true) 
|-- C13: string (nullable = true) 
|-- C14: string (nullable = true) 
|-- C15: string (nullable = true) 
|-- C16: string (nullable = true) 
|-- C17: string (nullable = true) 
|-- C18: string (nullable = true) 
|-- C19: string (nullable = true) 
|-- C20: string (nullable = true) 
|-- C21: string (nullable = true) 
|-- C22: string (nullable = true) 
|-- C23: string (nullable = true) 
|-- C24: string (nullable = true) 
|-- C25: string (nullable = true) 
|-- C26: string (nullable = true) 
|-- C27: string (nullable = true) 
|-- C28: string (nullable = true) 
|-- C29: string (nullable = true) 
|-- C30: string (nullable = true) 
|-- C31: string (nullable = true) 
関連する問題