CPUで前処理中に複数のスレッドを使用するとTensorflowの処理速度が遅くなる

CPU上でオンザフライで生成されるデータセットがあります。サンプルはPythonで計算され、関数make_sampleはかなり複雑でははテンソルフロー演算に変換できません。サンプルの生成には時間がかかるため、入力キューを埋めるために関数を複数のスレッドから呼び出す必要があります。私はexample given in the documentationから始めて、次のおもちゃの例に到着CPUで前処理中に複数のスレッドを使用するとTensorflowの処理速度が遅くなる

：

import numpy as np 
import tensorflow as tf 
import time 

def make_sample(): 
    # something that takes time and needs to be on CPU w/o tf ops 
    p = 1 
    for n in range(1000000): 
    p = (p + np.random.random()) * np.random.random() 
    return np.float32(p) 

read_threads = 1 

with tf.device('/cpu:0'): 
    example_list = [tf.py_func(make_sample, [], [tf.float32]) for _ in range(read_threads)] 
    for ex in example_list: 
    ex[0].set_shape(()) 
    batch_size = 3 
    capacity = 30 
    batch = tf.train.batch_join(example_list, batch_size=batch_size, capacity=capacity) 

with tf.Session().as_default() as sess: 
    tf.global_variables_initializer().run() 
    coord = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(sess=sess, coord=coord) 
    try: 
    # dry run, left out of timing 
    sess.run(batch) 
    start_time = time.time() 
    for it in range(5): 
     print(sess.run(batch)) 
    finally: 
    duration = time.time() - start_time 
    print('duration: {0:4.2f}s'.format(duration)) 
    coord.request_stop() 
    coord.join(threads)

私に驚き、何がread_threadsが増加する場合、CPUの使用率が50％以上になることはありません、ということです。さらに悪い何、計算時間が急激に低下：自分のコンピュータ上で、

read_threads=1→duration: 12s
read_threads=2→duration: 46s
read_threads=4→duration: 68s
→duration: 112sread_threads=8

、説明ありそしてとりわけ、効率的なmultithrを得るための解決策テンソルフローでのカスタムPython関数を使ったデータ生成

出典

2017-08-01 user1735003

tf.py_func既存のPythonインタプリタを再利用します。残念ながら、Pythonは並行性をサポートしますが、並列性はサポートしていません。つまり、複数のPythonスレッドを持つことができますが、いつでもPythonコードを実行できるのは1つだけです。標準的なソリューションは、生成パイプラインをTensorFlow/C++に移動すること、または複数のPythonプロセスと追加のレイヤーを使用して結果を集約することです（ZMQを使用して複数のPythonプロセスの結果を集約する）。

出典

2017-08-01 20:12:05

CPUで前処理中に複数のスレッドを使用するとTensorflowの処理速度が遅くなる

答えて

関連する問題