[tensorflow.__version__
:'0.12.head'
]テンソルフローは、複数のセッションでGPUメモリをどのように管理しますか?
私は異なる構造とパラメータを持つ5つのネットワークを持っています。私はそれらをGPUを搭載したサーバーに展開したいと考えています。私の理解では、データをバッチで処理する方が効率的です。しかし、私はバッチサイズを決定する方法を知らない。
だから私は、次のことを試してみました:
net_output = create_graph()
sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())
batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
はsess1.run
を実行することがsess1
はそのbatch_size 64
が大きすぎるにもかかわらず、batch_size 64
で走ったことができたように思われた
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 1.51GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
のような警告を得ました。
その後、サーバーは他の4つのネットワークの使用要求を受け取りました。だから私は、4つのセッションでネットワークをロード:
# assume 5 networks have the same structure
sess2 = tf.Session()
sess2.run(tf.global_variables_initializer())
sess3 = tf.Session()
sess3.run(tf.global_variables_initializer())
sess4 = tf.Session()
sess4.run(tf.global_variables_initializer())
sess5 = tf.Session()
sess5.run(tf.global_variables_initializer())
その後、私は再び第1のネットワークを走っ:
batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
私はException
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape...
を得た。この時間は、だから私はネットワークをアンロードすることを決定しました、
sess2.close()
sess3.close()
sess4.close()
sess5.close()
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
他の4つのセッションを閉じた後に私がsess1.run
と呼んだとき、私はまだ例外ResourceExhaustedError
を持っています。
私の質問は以下のとおりです。
は
Session.close
リリースGPUのメモリをしていますか?sess1.run
が起動してsess[2-5]
を閉じた後に転送できないのはなぜですか?GPUを搭載したサーバーに複数のネットワークを展開する方法はありますか?
私は多分私は5つのプロセス、各ネットワークのための1つを起動することができると考えられており、各プロセスで
gpu_options = tf.ConfigProto(gpu_options=tf.GPUOptions(
per_process_gpu_memory_fraction=0.2))
sess = tf.Session(config=gpu_options)
と呼ばれます。しかし、私は、有効なバッチサイズがper_process_gpu_memory_fraction=1.0
に比べて小さくなるのではないかと心配しました。
[更新]私は私が初めてsess1.run(output)
を呼び出したときに、私が立ち上げた後、私は警告のみ
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB.
を得た
import tensorflow as tf
import tensorflow.contrib.slim as slim
def create_graph(x):
dummy = tf.zeros([100, 100, 100, 5])
return slim.repeat(x, 3, slim.conv2d, 87, [5, 5])
batch_size = 64
x = tf.zeros([batch_size, 256, 256, 3])
output = create_graph(x)
sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())
sess1.run(output)
num_other_sessions = 50
other_sessions = []
for _ in range(num_other_sessions):
sess = tf.Session()
sess.run(tf.global_variables_initializer())
other_sessions.append(sess)
try:
sess1.run(output)
except Exception as e:
print(e)
for sess in other_sessions:
sess.close()
# If I run the following two lines, the bottom sess1.run(output) could be run without error.
# del sess
# del other_sessions
try:
sess1.run(output)
except Exception as e:
print(e)
次のコードを実行します
他の50
セッションは、sess1.run(output)
を呼び出すと例外ResourceExhaustedError
。私はそれらのセッションを閉じようとしましたが、それは助けになりませんでした。
ログメッセージの一部:TensorFlowで
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.228
pciBusID 0000:03:00.0
Total memory: 5.93GiB
Free memory: 5.84GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
...
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 1, Chunks in use: 0 7.0KiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 1, Chunks in use: 0 25.5KiB allocated for chunks. 25.5KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 1, Chunks in use: 0 586.2KiB allocated for chunks. 384.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 739.2KiB was 512.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:666] Size: 586.2KiB | Requested Size: 384.0KiB | in_use: 0, prev: Size: 25.5KiB | Requested Size: 25.5KiB | in_use: 1, next: Size: 739.2KiB | Requested Size: 739.2KiB | in_use: 1
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780900 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780c00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781800 of size 256
...
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x130eb8c100 of size 7168
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310cf5a00 of size 26112
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310d0f200 of size 600320
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 255 Chunks of size 256 totalling 63.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 303 Chunks of size 512 totalling 151.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 768 totalling 2.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 1280 totalling 60.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 1536 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 49 Chunks of size 26112 totalling 1.22MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 49920 totalling 48.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50944 totalling 49.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 98 Chunks of size 756992 totalling 70.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 783104 totalling 2.99MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50331648 totalling 48.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1459617792 totalling 2.72GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2904102656 totalling 2.70GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.54GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5953290240
InUse: 5952656640
MaxInUse: 5953264128
NumAllocs: 1259
MaxAllocSize: 2904102656
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************xxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 739.2KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[87,87,5,5]
「export TF_CUDNN_WORKSPACE_LIMIT_IN_MB = 0」を実行するとどうなりますか?いくつか大きな変数を使って遊んだので、セッションを削除するとメモリが解放されるようです - https://github.com/yaroslavvb/notebooks/blob/master/gpu-var-memory.ipynb –
'TF_CUDDN_WORKSPACE_LIMIT_IN_MB = 0'を環境変数も同じように見えました。私がノートを勉強するのには長い時間がかかります。ありがとうございました。 – Meuu
私はあなたがセクション "テストセッションデルがメモリを解放する"だけ見ることができると思います。 'allocate(n)'を呼び出すと、2 * n GBのメモリを使用して変数を作成し、次にセッションを閉じます。したがって、その関数をコピー/ペーストしてループで呼び出すことで、メモリが再利用されていることを確認することができます。 –