テンソルフローは、複数のセッションでGPUメモリをどのように管理しますか？

[tensorflow.__version__：'0.12.head']テンソルフローは、複数のセッションでGPUメモリをどのように管理しますか？

私は異なる構造とパラメータを持つ5つのネットワークを持っています。私はそれらをGPUを搭載したサーバーに展開したいと考えています。私の理解では、データをバッチで処理する方が効率的です。しかし、私はバッチサイズを決定する方法を知らない。

だから私は、次のことを試してみました：

net_output = create_graph() 

sess1 = tf.Session() 
sess1.run(tf.global_variables_initializer()) 

batch_size = 64 
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})

はsess1.runを実行することがsess1はそのbatch_size 64が大きすぎるにもかかわらず、batch_size 64で走ったことができたように思われた

W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 1.51GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

のような警告を得ました。

その後、サーバーは他の4つのネットワークの使用要求を受け取りました。だから私は、4つのセッションでネットワークをロード：

# assume 5 networks have the same structure 
sess2 = tf.Session() 
sess2.run(tf.global_variables_initializer()) 
sess3 = tf.Session() 
sess3.run(tf.global_variables_initializer()) 
sess4 = tf.Session() 
sess4.run(tf.global_variables_initializer()) 
sess5 = tf.Session() 
sess5.run(tf.global_variables_initializer())

その後、私は再び第1のネットワークを走っ：

batch_size = 64 
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})

私はException

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape...

を得た。この時間は、だから私はネットワークをアンロードすることを決定しました、

sess2.close() 
sess3.close() 
sess4.close() 
sess5.close() 

sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})

他の4つのセッションを閉じた後に私がsess1.runと呼んだとき、私はまだ例外ResourceExhaustedErrorを持っています。

私の質問は以下のとおりです。

はSession.closeリリースGPUのメモリをしていますか？ sess1.runが起動してsess[2-5]を閉じた後に転送できないのはなぜですか？
GPUを搭載したサーバーに複数のネットワークを展開する方法はありますか？

私は多分私は5つのプロセス、各ネットワークのための1つを起動することができると考えられており、各プロセスで

gpu_options = tf.ConfigProto(gpu_options=tf.GPUOptions(
    per_process_gpu_memory_fraction=0.2)) 
sess = tf.Session(config=gpu_options)

と呼ばれます。しかし、私は、有効なバッチサイズがper_process_gpu_memory_fraction=1.0に比べて小さくなるのではないかと心配しました。

[更新]私は私が初めてsess1.run(output)を呼び出したときに、私が立ち上げた後、私は警告のみ

W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB.

を得た

import tensorflow as tf 
import tensorflow.contrib.slim as slim 

def create_graph(x): 
    dummy = tf.zeros([100, 100, 100, 5]) 
    return slim.repeat(x, 3, slim.conv2d, 87, [5, 5]) 

batch_size = 64 
x = tf.zeros([batch_size, 256, 256, 3]) 
output = create_graph(x) 

sess1 = tf.Session() 
sess1.run(tf.global_variables_initializer()) 
sess1.run(output) 

num_other_sessions = 50 
other_sessions = [] 
for _ in range(num_other_sessions): 
    sess = tf.Session() 
    sess.run(tf.global_variables_initializer()) 
    other_sessions.append(sess) 

try: 
    sess1.run(output) 
except Exception as e: 
    print(e) 

for sess in other_sessions: 
    sess.close() 

# If I run the following two lines, the bottom sess1.run(output) could be run without error. 
# del sess 
# del other_sessions 

try: 
    sess1.run(output) 
except Exception as e: 
    print(e)

次のコードを実行します

他の50セッションは、sess1.run(output)を呼び出すと例外ResourceExhaustedError。私はそれらのセッションを閉じようとしましたが、それは助けになりませんでした。

ログメッセージの一部：TensorFlowで

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 980 Ti 
major: 5 minor: 2 memoryClockRate (GHz) 1.228 
pciBusID 0000:03:00.0 
Total memory: 5.93GiB 
Free memory: 5.84GiB 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0) 
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0) 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0) 

... 

I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 1, Chunks in use: 0 7.0KiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384):  Total Chunks: 1, Chunks in use: 0 25.5KiB allocated for chunks. 25.5KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 1, Chunks in use: 0 586.2KiB allocated for chunks. 384.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 739.2KiB was 512.0KiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:666] Size: 586.2KiB | Requested Size: 384.0KiB | in_use: 0, prev: Size: 25.5KiB | Requested Size: 25.5KiB | in_use: 1, next: Size: 739.2KiB | Requested Size: 739.2KiB | in_use: 1 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780000 of size 1280 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780500 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780600 of size 512 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780800 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780900 of size 512 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780b00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780c00 of size 512 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780e00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780f00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781000 of size 1280 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781500 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781600 of size 512 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781800 of size 256 

... 

I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x130eb8c100 of size 7168 
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310cf5a00 of size 26112 
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310d0f200 of size 600320 
I tensorflow/core/common_runtime/bfc_allocator.cc:693]  Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 255 Chunks of size 256 totalling 63.8KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 303 Chunks of size 512 totalling 151.5KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 768 totalling 2.2KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 1280 totalling 60.0KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 1536 totalling 4.5KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 49 Chunks of size 26112 totalling 1.22MiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 49920 totalling 48.8KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50944 totalling 49.8KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 98 Chunks of size 756992 totalling 70.75MiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 783104 totalling 2.99MiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50331648 totalling 48.00MiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1459617792 totalling 2.72GiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2904102656 totalling 2.70GiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.54GiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:     5953290240 
InUse:     5952656640 
MaxInUse:    5953264128 
NumAllocs:     1259 
MaxAllocSize:   2904102656 

W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************xxxxxxxxxxxxxxxxxxxxxxxx 
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 739.2KiB. See logs for memory state. 
W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[87,87,5,5]

出典

2016-12-22 Meuu

「export TF_CUDNN_WORKSPACE_LIMIT_IN_MB = 0」を実行するとどうなりますか？いくつか大きな変数を使って遊んだので、セッションを削除するとメモリが解放されるようです - https://github.com/yaroslavvb/notebooks/blob/master/gpu-var-memory.ipynb –

'TF_CUDDN_WORKSPACE_LIMIT_IN_MB = 0'を環境変数も同じように見えました。私がノートを勉強するのには長い時間がかかります。ありがとうございました。 – Meuu

私はあなたがセクション "テストセッションデルがメモリを解放する"だけ見ることができると思います。 'allocate（n）'を呼び出すと、2 * n GBのメモリを使用して変数を作成し、次にセッションを閉じます。したがって、その関数をコピー/ペーストしてループで呼び出すことで、メモリが再利用されていることを確認することができます。 –

テンソルは、メモリの大部分を保持します。それらのほとんどは不変で、session.run（）の呼び出しの中に存在します。変数、キューなどの他のものは、より長く生きることができます。使用しているセッションのタイプに応じて、DirectSessionはセッションを閉じるときに変数を解放します。

あなたのコードからは、他のセッションを開いたり、グローバルイニシャライザを実行したり、それらを閉じたりするのは、メモリ使用量に影響を与えます。完全な画像を理解できるように、完全な複製事例を単純化して共有することができます。

出典

2016-12-22 21:42:59 zhengxq

ありがとうございます@zhengxq。 'DirectSession'はPythonの' tf.Session'と同じですか？私は自分の質問を更新し、コードを追加しました。 – Meuu

はい、 'tf.Session（" grpc：// ... '）とは対照的に、' tf.Session（ ""） 'はDirectSessionです。 –

テンソルフローは、複数のセッションでGPUメモリをどのように管理しますか？

答えて

関連する問題