私は、次のようになります。（フランス語）データセット持っているCSVがpandas.read_csv

を使用して千区切り文字としてスペースを持つファイル読む：私は、次のコマンドを使用してそれを読もうと私は、次のようになります。（フランス語）データセット持っているCSVがpandas.read_csv

time;col1;col2;col3 
06.09.2017 05:30;329,02;5,7;259 
06.09.2017 05:40;500,5;6,6;261 
06.09.2017 05:50;521,73;6,7;266 
06.09.2017 06:00;1 091,33;9,1;273 
06.09.2017 06:10;1 262,43;10;285

を：

import pandas as pd 
df=pd.read_csv("Example_dataset.csv", 
      index_col=0, 
      encoding='latin', 
      parse_dates=True, 
      dayfirst=True, 
      sep=';', 
      decimal=',', 
      thousands=' ')

col2とcol3はfloatとintegerとして認識されますが、col1はそこに何千ものセパレータがあるため数字として認識されません。このデータセットを簡単に読み取る方法はありますか？ thousands=' 'を設定すると、動作していないよう：

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 5 entries, 2017-09-06 05:30:00 to 2017-09-06 06:10:00 
Data columns (total 3 columns): 
col1 5 non-null object 
col2 5 non-null float64 
col3 5 non-null int64 
dtypes: float64(1), int64(1), object(1) 
memory usage: 160.0+ bytes

任意の提案ですか？

出典

2017-09-28 Nickj

試してみてください。 'df.col1 = df.col1。 –

パンダ '0.20.1'でテストしたところ、あなたのコードは動作していますが、どのバージョンを使っていますか？ – zipa

それは動作しませんでした。私はこの空間が「非破壊空間」だと考えます。コードを次のように変更しました： 'df.col1 = df.col1.str.replace（ '\ s +'、 ''）.str.replace（ '、 '、'。 '）。astype（float） ' – Nickj

あなたは非破壊スペースを持っている場合、私はstr.replaceとのより積極的な正規表現を示唆している：

df.col1 = df.col1.str.replace('[^\d.,e+-]', '')\ 
       .str.replace(',', '.').astype(float)

が正規表現

[  # character group 
^  # negation - ignore everything in this character group 
\d  # digit 
.  # dot 
e  # 'e' - exponent 
+-  # signs 
]

出典

2017-09-28 09:07:37

私は、次のようになります。（フランス語）データセット持っているCSVがpandas.read_csv

答えて

関連する問題