CSVファイルの読み込み - 列の値は数字で始まり、D/Fで終わります。

CSVファイルを読み込むためにsparkを使用します.csvのフィールド値の1つは91520122094491671Dです。
読み取り後、値は9.152012209449166...です。
文字列が数字で始まり、D/Fで終わると、その結果になります。
しかし、私は文字列としてデータを読む必要があります。
どうすればいいですか？CSVファイルの読み込み - 列の値は数字で始まり、D/Fで終わります。

これはCSVファイルのデータです。

sparkSession.read.format("com.databricks.spark.csv") 
    .option("header", "true") 
    .option("inferSchema", true.toString) 
    .load(getHadoopUri(uri)) 
    .createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds_tmp") 

sparkSession.sql(
    s""" 
    | select cast(tax_file_code as String) as tax_file_code, 
    |   cus_name, 
    |   cast(tax_identification_number as String) as tax_identification_number 
    | from t_datacent_cus_temp_guizhou_ds_tmp 
    """.stripMargin).createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds") 

sparkSession.sql("select * from t_datacent_cus_temp_guizhou_ds").show

結果を以下に示す：次のように

tax_file_code| cus_name| tax_identification_number 

T19915201| 息烽家吉装饰材料店| 91520122094491671D

Scalaのコードです。

+-----------------+-----------------+-------------------------+ 

|tax_file_code | cus_name  |tax_identification_number| 

+-----------------+-----------------+-------------------------+ 

| T19915201 |息烽家吉装饰材料店 |  9.152012209449166...| 

+-----------------+-----------------+-------------------------+

出典

2017-12-01 风逝花落

「9.15 ... E20」と似ていませんか？言い換えれば、それは指数形式です –

あなたの質問にCSVのサンプル行とスパークコードを含めるようにしてください –

OK、私の質問を変更しました –

の末尾にDのような音/ Fは、スキーマインタープリタを2倍または浮動小数点に設定しており、列が切り捨てられているため指数値が表示されます

すべてのcolu文字列であることを確認してください。

option("inferSchema", true.toString)

出典

2017-12-01 04:28:02

あなたは試すことができます：

sparkSession.sql("select * from t_datacent_cus_temp_guizhou_ds").show(20, False)

をfalseに設定することで切り捨てます。 trueの場合、より多くの20文字を超える文字列は切り捨てられれ、すべてのセルが整列されます右

編集：

val x = sparkSession.read.option("header", "true") 
     .option("header", "true") 
     .option("inferSchema", "true") 
     .csv("....src/main/resources/data.csv") 

    x.printSchema() 

    x.createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds_tmp") 


     sparkSession.sql(
     s""" 
      | select cast(tax_file_code as String) as tax_file_code, 
      |   cus_name, 
      |   cast(tax_identification_number as String) as tax_identification_number 
      | from t_datacent_cus_temp_guizhou_ds_tmp 
    """.stripMargin).createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds") 

     sparkSession.sql("select * from t_datacent_cus_temp_guizhou_ds").show(truncate = false)

この意志の出力として：

+-------------+----------+-------------------------+ 
|tax_file_code|cus_name |tax_identification_number| 
+-------------+----------+-------------------------+ 
|T19915201 | 息烽家吉装饰材料店|9.1520122094491664E16 | 
+-------------+----------+-------------------------+

出典

2017-12-01 03:23:23

いいえ、これ動作しません。私はこのコード行を追加したので見つけました 'option（" inferSchema "、true.toString）'。私がそれを取り除くと、それは大丈夫です。 –

.option（ "inferSchema"、true） –

いいえ、設定しないでください。 –

CSVファイルの読み込み - 列の値は数字で始まり、D/Fで終わります。

答えて

関連する問題