をPal」を拡張最高の方法はcsvファイルからデータを読み取ることです。だから、ファイルをCSVファイルに変換しないでください(またはメモリ内に存在するCSVファイルのようなオブジェクトも)。pandas
は汚い作業をしますか?あなたはThom Ives「答えからサンプルファイルを試してみる場合は
try:
import io # python3
except ImportError:
import cStringIO as io # python2
import pandas as pd
DELIMITER = ','
def pd_read_chunk(file):
"""
Reads file contents, converts it to a csv file in memory
and imports a dataframe from it.
"""
with open(file) as f:
content = [line.strip() for line in f.readlines()]
cols = [line for line in content if ' ' not in line]
vals = [line for line in content if ' ' in line]
csv_header = DELIMITER.join(cols)
csv_body = '\n'.join(DELIMITER.join(line.split()) for line in vals)
stream = io.StringIO(csv_header + '\n' + csv_body)
return pd.read_csv(stream, sep=DELIMITER)
if __name__ == '__main__':
files = ('file1', 'file2',)
# read dataframe from each file and concat all resulting dataframes
df_chunks = [pd_read_chunk(file) for file in files]
df = pd.concat(df_chunks)
print(df)
、スクリプトが
A B C D E
0 1.0 2.0 3.0 NaN NaN
1 1.1 2.1 3.1 NaN NaN
0 NaN 2.2 NaN 4.2 5.2
1 NaN 2.3 NaN 4.3 5.3
編集戻ります:実際に、私たちは、カンマ区切り文字を必要としない - 私たちはすることができますスペースをデリミタとして再利用することで、コンパクト化と同時に変換を高速化できます。上記のもののアップデート版で、コードが少なく、実行速度が速いです:
try:
import io # python3
except ImportError:
import cStringIO as io # python2
import pandas as pd
def pd_read_chunk(file):
"""
Reads file contents, converts it to a csv file in memory
and imports a dataframe from it.
"""
with open(file) as f:
content = [line.strip() for line in f.readlines()]
cols = [line for line in content if ' ' not in line]
vals = [line for line in content if ' ' in line]
csv_header = ' '.join(cols)
csv_lines = [csv_header] + vals
stream = io.StringIO('\n'.join(csv_lines))
return pd.read_csv(stream, sep=' ')
if __name__ == '__main__':
files = ('file1', 'file2',)
# read dataframe from each file and concat all resulting dataframes
df_chunks = [pd_read_chunk(file) for file in files]
df = pd.concat(df_chunks)
print(df)