大きなファイルから配列を素早く作成するには？

私が持っている例：大きなファイルから配列を素早く作成するには？

for line in IN.readlines(): 
     line = line.rstrip('\n') 
     mas = line.split('\t') 
     row = (int(mas[0]), int(mas[1]), mas[2], mas[3], mas[4]) 
     self.inetnums.append(row) 
    IN.close()

ffilesize ==場合は120メガバイト、スクリプト時間= 10秒。この時間を減らすことはできますか？

出典

2012-04-05 Bdfy

ているのですか？あなたのマシンはどれだけのメモリを持っていますか？ – interjay

12GB /秒のハードディスクは何ですか？ –

あなたはここで

inetnums=[(int(x) for x in line.rstrip('\n').split('\t')) for line in fin]

リスト理解を使用する場合は、いくつかの速度を得ることは、プロファイル情報は、あなたがメモリに120ギガバイトのファイルを読んでいる2つの異なるバージョン

>>> def foo2(): 
    fin.seek(0) 
    inetnums=[] 
    for line in fin: 
     line = line.rstrip('\n') 
     mas = line.split('\t') 
     row = (int(mas[0]), int(mas[1]), mas[2], mas[3]) 
     inetnums.append(row) 


>>> def foo1(): 
    fin.seek(0) 
    inetnums=[[int(x) for x in line.rstrip('\n').split('\t')] for line in fin] 

>>> cProfile.run("foo1()") 
     444 function calls in 0.004 CPU seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.003 0.003 0.004 0.004 <pyshell#362>:1(foo1) 
     1 0.000 0.000 0.004 0.004 <string>:1(<module>) 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     220 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} 
     1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects} 
     220 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 


>>> cProfile.run("foo2()") 
     664 function calls in 0.006 CPU seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.005 0.005 0.006 0.006 <pyshell#360>:1(foo2) 
     1 0.000 0.000 0.006 0.006 <string>:1(<module>) 
     220 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     220 0.001 0.000 0.001 0.000 {method 'rstrip' of 'str' objects} 
     1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects} 
     220 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects} 


>>>

出典

2012-04-05 10:56:33 Abhijit

'readlines'を取り除いて得たスピード以外にも、実際にリストcompを使ってスピードを上げてもらえますか？同じコードを書いている別の方法が好きなようです。 – jamylak

@jamylak：ループで複数回追加を呼び出さないという事実を考えてください。私はcProfileの情報で自分の答えを更新しました。 – Abhijit

readlines()

を削除するだけで、ファイルからすべての行のリストを作成し、あなたがする必要はありません。それぞれを、アクセスしているreadlinesを使用して

for line in IN:

を行います。それがなければ、forループは単純にジェネレータを使用します。ジェネレータはファイルから毎回ラインを返します。

出典

2012-04-05 10:32:01 jamylak

大きなファイルから配列を素早く作成するには？

答えて

関連する問題