Pythonでの2D配列の連続保存

私はsCMOS（科学CMOS）カメラからデータを取得するためのプログラムを書いています。計画は高いフレームレートで取得することになっているので、私が取得している間にディスクに保存したいので、メモリなしで終了する前に記録できる合計時間が長くなります。Pythonでの2D配列の連続保存

バイナリ形式で同じファイルに連続して保存する方法はありますか？理想的には、フレームごとに1つのファイルを作成するオプションを除外します。

出典

2017-02-09 Aquiles Carattino

しばらくしばらく触った後、私はマルチスレッドモジュールを使ってこの問題を解決しました。この考えは、2つのプロセスを稼働させることです。主プロセスはデータを取得し、作業者は連続的にディスクに保存します。これを実現するには、プロセス間で安全な方法でデータを共有するキューを定義する必要があります。フレームが保存されると、メモリが解放されます。 マルチプロセッシングを使用し、スレッド化しないことが重要です。マルチプロセッシングは本当にプロセスを別のPythonインタプリタに分けます。スレッディングは同じインタプリタを使用します。したがって、あなたのプロセスの1つがあなたのスクリプトを実行しているコアの100％を吸うと、事態が止まるでしょう。私のアプリケーションでは、フレームレートを大幅に変更するので、これは重要です。私はHDF5形式のファイルを保存するためにh5py使用していますが、あなたは簡単に、numpyのを使用して、プレーンテキストファイルに保存するためのコードを適応させることができ

まず、私は労働者の関数を定義など：

用心後で別のプロセスに送信されます。入力は、データを保存するファイルとデータを含むキューです。無限ループは、たとえキューが空であっても、私が決定する前に終了する関数がないためです。出口フラグは、キューに渡される単なる文字列です。

import h5py 
from multiprocessing import Process, Queue 

def workerSaver(fileData,q): 
    """Function that can be run in a separate thread for continuously save data to disk. 
    fileData -- STRING with the path to the file to use. 
    q -- Queue that will store all the images to be saved to disk. 
    """ 
    f = h5py.File(fileData, "w") # This will overwrite the file. Be sure to supply a new file path. 

    allocate = 100 # Number of frames to allocate along the z-axis. 
    keep_saving = True # Flag that will stop the worker function if running in a separate thread. 
         # Has to be submitted via the queue a string 'exit' 
    i=0 
    while keep_saving: 
     while not q.empty(): 
      img = q.get() 
      if i == 0: # First time it runs, creates the dataset 
       x = img.shape[0] 
       y = img.shape[1] 
       dset = f.create_dataset('image', (x,y,allocate), maxshape=(x,y,None)) # The images are going to be stacked along the z-axis. 
                       # The shape along the z axis will be increased as the number of images increase. 
      if type(img)==type('exit'): 
       keep_saving = False 
      else: 
       if i == dset.shape[2]: 
        dset.resize(i+allocate,axis=2) 
       dset[:,:,i] = img 
       i+=1 
    f.close()

ここで、私たちは作業者の行動を定義するコードの重要な部分です。

import numpy as np 
import time 
fileData = 'path-to-file.dat' 
# Queue of images. multiprocessing takes care of handling the data in and out 
# and the sharing between parent and child processes. 
q = Queue(0) 
# Child process to save the data. It runs continuously until an exit flag 
# is passed through the Queue. (q.put('exit')) 
p = Process(target=workerSaver,args=(fileData,q,)) 
p.start() 
example_image = np.ones((50,50)) 
for i in range(10000): 
    q.put(example_image) 
    print(q.qsize()) 
    time.sleep(0.01) # Sleep 10ms 

q.put('Exit') # Any string would work 
p.join()

我々はキューqを充填開始する前に、プロセスpが開始され、実行されますことを確認します。確かに、データを格納するためのより洗練された方法があります（たとえば、すべての単一イメージではなく、チャンクなど）。しかし、私はチェックを行い、ディスクは完全な書き込み速度であるため、その面で改善があるかどうかはわかりません。私たちが保存しようとしているデータのタイプを正確に知っていると、特にHDF5（32ビットよりも8ビット整数を格納するのと同じではありません）の処理速度が向上します

出典

2017-02-16 14:46:32

Pythonでの2D配列の連続保存

答えて

関連する問題