GeneratorでThreadPoolを正しく使用する

Python 2.7でCSVファイルを処理するときに、ThreadPoolとGeneratorを使用するときに問題があります。ここに私のポイントを説明するいくつかのサンプルコードは次のとおりです。GeneratorでThreadPoolを正しく使用する

from multiprocessing.dummy import Pool as ThreadPool 
import time 

def getNextBatch(): 
    # Reads lines from a huge CSV and yields them as required. 
    for i in range(5): 
     yield i; 

def processBatch(batch): 
    # This simulates a slow network request that happens. 
    time.sleep(1); 
    print "Processed Batch " + str(batch); 

# We use 4 threads to attempt to aleviate the bottleneck caused by network I/O. 
threadPool = ThreadPool(processes = 4) 

batchGenerator = getNextBatch() 

for batch in batchGenerator: 
    threadPool.map(processBatch, (batch,)) 

threadPool.close() 
threadPool.join()

私はこれを実行すると、私が期待される出力を得る：

バッチ1

バッチ処理加工

バッチ処理0

処理バッチ3

バッチ処理4

問題は、各プリント間1秒の遅延と表示されることです。効果的には、私のスクリプトは順番に実行されています（複数のスレッドを使用しないでください）。

ここでの目標は、印刷されたステートメントをすべて1秒間に1秒間で表示するのではなく、5秒間表示することです。

出典

2017-11-17 Loic Verrall

は、ここで私は（しかしないために）期待通りに働いていた

threadPool.map(processBatch, batchGenerator)

を試して問題

for batch in batchGenerator: 
    threadPool.map(processBatch, (batch,))

です。 forループは、threadPoolを使用して各バッチを1つずつ処理しています。それで、1つを終えて、次に動いて、次に動いた。

出典

2017-11-17 18:11:53

GeneratorでThreadPoolを正しく使用する

答えて

関連する問題