2016-12-27 21 views
0

私はCSVファイルを読んでいると困っています。csvから余分な引用符を削除するには?

私はreplaceメソッドを試しました。しかしnumpyはそれをサポートしていません。

csvファイル形式は次のとおりです。

"num","phone","sensorID","press","temp","accel","gps_lat","gps_lng","time" 
"1","null","A0:E6:F8:7B:16:EA","0","17","1.25","0","0","2016-12-14 13:34:59" 
"2","null","A0:E6:F8:7B:16:A9","0","18","1.19","0","0","2016-12-14 13:34:59" 
"3","null","A0:E6:F8:7B:15:A5","0","18","1.19","0","0","2016-12-14 13:34:59" 
"4","null","A0:E6:F8:7B:16:EA","0","17","1.25","0","0","2016-12-14 13:35:00" 
"5","null","A0:E6:F8:7B:16:A9","0","18","1.19","0","0","2016-12-14 13:35:00" 
"6","null","A0:E6:F8:7B:15:A5","0","19","1.38","0","0","2016-12-14 13:35:00" 
"7","null","A0:E6:F8:7B:16:D6","0","18","1.12","0","0","2016-12-14 13:35:01" 
"8","null","A0:E6:F8:7B:16:EA","0","17","1.31","0","0","2016-12-14 13:35:01" 
"9","null","A0:E6:F8:7B:15:A5","0","19","1.38","0","0","2016-12-14 13:35:01" 

しかし、私はnumpy.loadtxtでこのファイルを使用する場合、結果はちょうど私が」を削除したいこの

ソースコード

import numpy as np 
a= np.loadtxt('db_file.csv', delimiter=',', dtype='str', unpack=True) 
print a 

結果

[['"num"' '"1"' '"2"' ..., '"6979"' '"6980"' '"6981"'] 
['"phone"' '"null"' '"null"' ..., '" 821099631345"' '" 821099631345"' 
    '" 821099631345"'] 
['"sensorID"' '"A0:E6:F8:7B:16:EA"' '"A0:E6:F8:7B:16:A9"' ..., 
    '"A0:E6:F8:7B:16:EA"' '"A0:E6:F8:7B:16:A9"' '"A0:E6:F8:7B:16:D6"'] 
..., 
['"gps_lat"' '"0"' '"0"' ..., '37.596332"' '"37.596332"' '"37.596332"'] 
['"gps_lng"' '"0"' '"0"' ..., '"127.031773"' '"127.031773"' '"127.031773"'] 
['"time"' '"2016-12-14 13:34:59"' '"2016-12-14 13:34:59"' ..., 
    '"2016-12-15 00:03:11"' '"2016-12-15 00:03:11"' '"2016-12-15 00:03:12"']] 

のようなものですこの1つ。

だから本当にこのリストが欲しい。

[['num', '1', '2' ..., '6979', '6980', '6981'] 
['phone', 'null', 'null' ..., '821099631345', ' 821099631345' 
    ' 821099631345'] 
['sensorID', 'A0:E6:F8:7B:16:EA', 'A0:E6:F8:7B:16:A9' ..., 
    'A0:E6:F8:7B:16:EA', 'A0:E6:F8:7B:16:A9', 'A0:E6:F8:7B:16:D6'] 
..., 
['gps_lat', '0', '0' ..., '37.596332' '37.596332' '37.596332'] 
['gps_lng' '0' '0' ..., '127.031773' '127.031773' '127.031773'] 
['time' '2016-12-14 13:34:59' '2016-12-14 13:34:59' ..., 
    '2016-12-15 00:03:11' '2016-12-15 00:03:11' '2016-12-15 00:03:12']] 

私はどのようなコードを使用しますか?

+0

件名は修正が必要です。 – hpaulj

+0

'pd.read_csv'はこのファイルを問題なく処理しているようです。 'genfromtxt'も動作させることができますが、' pandas'があれば簡単になります。 – hpaulj

+0

ここに便利なものがありますか? http://stackoverflow.com/questions/2664790/reading-csv-files-in-numpy-where-delimiter-is –

答えて

0

に私が取得:

In [1278]: pd.read_csv('stack41338622.txt') 
Out[1278]: 
    num phone   sensorID press temp accel gps_lat gps_lng \ 
0 1 null A0:E6:F8:7B:16:EA  0 17 1.25  0  0 
1 2 null A0:E6:F8:7B:16:A9  0 18 1.19  0  0 
2 3 null A0:E6:F8:7B:15:A5  0 18 1.19  0  0 
3 4 null A0:E6:F8:7B:16:EA  0 17 1.25  0  0 
4 5 null A0:E6:F8:7B:16:A9  0 18 1.19  0  0 
5 6 null A0:E6:F8:7B:15:A5  0 19 1.38  0  0 
6 7 null A0:E6:F8:7B:16:D6  0 18 1.12  0  0 
7 8 null A0:E6:F8:7B:16:EA  0 17 1.31  0  0 
8 9 null A0:E6:F8:7B:15:A5  0 19 1.38  0  0 

        time 
0 2016-12-14 13:34:59 
1 2016-12-14 13:34:59 
2 2016-12-14 13:34:59 
3 2016-12-14 13:35:00 
4 2016-12-14 13:35:00 
5 2016-12-14 13:35:00 
6 2016-12-14 13:35:01 
7 2016-12-14 13:35:01 
8 2016-12-14 13:35:01 

convertersReading CSV files in numpy where delimiter is ","で説明したように、我々は、余分な引用符を取り除くことができます残念ながら。はもはやコンバータで動作しないので、私たちはそれを綴る必要があります。ここでスタートだ:私はこれに費やした時間の量を考えると

In [1327]: def foo(astr): 
     ...:  return astr[1:-1] 
In [1328]: convs = dict((col, foo) for col in range(9)) 
In [1329]: dt = ['i','S10','S20','i', 'i','f','i','i','S20'] 
In [1330]: data = np.genfromtxt('stack41338622.txt', dtype=dt, delimiter=',', names=True, converters=convs) 
In [1331]: data 
Out[1331]: 
array([ (1, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:34:59'), 
     (2, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:34:59'), 
     (3, b'null', b'A0:E6:F8:7B:15:A5', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:34:59'), 
     (4, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:35:00'), 
     (5, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:35:00'), 
     (6, b'null', b'A0:E6:F8:7B:15:A5', 0, 19, 1.3799999952316284, 0, 0, b'2016-12-14 13:35:00'), 
     (7, b'null', b'A0:E6:F8:7B:16:D6', 0, 18, 1.1200000047683716, 0, 0, b'2016-12-14 13:35:01'), 
     (8, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.309999942779541, 0, 0, b'2016-12-14 13:35:01'), 
     (9, b'null', b'A0:E6:F8:7B:15:A5', 0, 19, 1.3799999952316284, 0, 0, b'2016-12-14 13:35:01')], 
     dtype=[('num', '<i4'), ('phone', 'S10'), ('sensorID', 'S20'), ('press', '<i4'), ('temp', '<i4'), ('accel', '<f4'), ('gps_lat', '<i4'), ('gps_lng', '<i4'), ('time', 'S20')]) 

、私は他の提案と行くに傾いている - テキストエディタで、余分な引用符を取り除きます。これらの引用符はカンマ区切りのファイルでは必要なく、ヘルプよりも厄介です。私はちょうど"を削除エディタで

num,phone,sensorID,press,temp,accel,gps_lat,gps_lng,time 
1,null,A0:E6:F8:7B:16:EA,0,17,1.25,0,0,2016-12-14 13:34:59 
2,null,A0:E6:F8:7B:16:A9,0,18,1.19,0,0,2016-12-14 13:34:59 
3,null,A0:E6:F8:7B:15:A5,0,18,1.19,0,0,2016-12-14 13:34:59 
4,null,A0:E6:F8:7B:16:EA,0,17,1.25,0,0,2016-12-14 13:35:00 
5,null,A0:E6:F8:7B:16:A9,0,18,1.19,0,0,2016-12-14 13:35:00 
... 

In [1336]: data = np.genfromtxt('stack41338622_1.txt', dtype=None, delimiter=',', names=True) 
In [1337]: data 
Out[1337]: 
array([ (1, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:34:59'), 
     (2, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.19, 0, 0, b'2016-12-14 13:34:59'), 
     (3, b'null', b'A0:E6:F8:7B:15:A5', 0, 18, 1.19, 0, 0, b'2016-12-14 13:34:59'), 
     ..., 
     dtype=[('num', '<i4'), ('phone', 'S4'), ('sensorID', 'S17'), ('press', '<i4'), ('temp', '<i4'), ('accel', '<f8'), ('gps_lat', '<i4'), ('gps_lng', '<i4'), ('time', 'S19')]) 

b''は、バイト文字列を示すののpython3の方法です。あなたはPy2でそれらを見ることはできません。

関連する問題