TensorflowのLSTM-RNNでいくつかの音楽データをトレーニングしていて、私が理解できないGPUメモリ割り当てに関する問題が発生しました。実際にはOOMに遭遇しましたまだ十分なVRAMがあります。 いくつかの背景: 私はGTX1060 6ギガバイト、インテルのXeon E3-1231V3と8ギガバイトのRAMを使用して、UbuntuのGnomeの16.04に取り組んでいます。 だから今、私は理解することができ、エラー・メッセージの最初の部分で、私はそれを支援するために頼むかもしれない人のためもう一度最後に全体のエラーメッセージを追加します。Tensorflow OOM on GPU
私は/コアをtensorflow /common_runtime/bfc_allocator.cc:696] 2.0KiB I tensorflow /コア/ common_runtime/bfc_allocator.cc合計 サイズ256の8つのチャンク:696] 1.2KiB I合計サイズ 1280 1つのチャンクをtensorflow /コア/ common_runtime/bfc_allocator .cc:696] 5チャンクサイズ 44288合計216.2KiB I テンソルフロー/コア/共通時間/ bfc_allocator.cc:696 5サイズチャンク 56064合計273.8KiB I tensorflow /コア/ common_runtime/bfc_allocator.cc:696] 588.80MiB I tensorflow /コア/ common_runtime/bfc_allocator.cc合計サイズ 154350080の4つのチャンク:696] 2.27GiB I tensorflow /コア合計サイズ 813400064の3つのチャンクを700] 使用中チャンクの総和:4.35GiB I tensorflow /コア/ common_runtime 1.50GiB I tensorflow /コア/ common_runtime/bfc_allocator.cc合計/common_runtime/bfc_allocator.cc:696]サイズの1つのチャンク 1612612352 /bfc_allocator.cc:702]統計:
制限:5484118016
使用中(InUse)。 4670717952
MaxInUse:5484118016
NumAllocs:29
MaxAllocSize:1612612352
Wのtensorflow /コア/ common_runtime/bfc_allocator.cc:274] ********** ************ ___________ * __ ************************************ *************** xxxxxxxxxxxxxx W tensorflow /コア/ common_runtime/bfc_allocator.cc:275]は775.72MiBを割り当てよう メモリの不足しています。メモリ状態のログを参照してください。 W tensorflow /コア/フレームワーク/ op_kernel.cc:993]リソース排出:OOM 形状とテンソルを割り当てる[14525,14000]
だから割り当てる5484118016バイトの最大値が存在することを読み取ることができ、 4670717952バイトはすべて使用中であり、別の777.72MB = 775720000バイトが割り当てられます。私の計算機によれば、5484118016バイト - 4670717952バイト - 775720000バイト= 37680064バイト。 彼はそこに押し込もうとしている新しいテンソルのためのスペースを割り当てた後、まだ37MBのフリーVRAMがあるはずです。 Tensorflowはおそらく(私は推測?)がまだ使用可能であるし、ちょうどRAMか何かで保留に残りのデータを置くよりも多くのVRAMを割り当てるしようとしないだろう、これは、私にとって非常に合法的であることもそうです。
私の思考には大きな誤りがあるようですが、誰かが私に説明することができれば非常に感謝しています。私の問題に対する明白な解決策は、バッチを少し小さくして、それぞれ約1.5GBで、おそらくちょうど大きすぎるようにすることです。それでも実際の問題が何であるかを知りたいです。
編集:私はしようとする私に言って何かを発見した:
まだ動作しませんが、tensorflowのドキュメントがgpu_options.allocator_type = 'BFC'
がどうなるかのいずれかの説明が不足しているとして、私は大好きだ
config = tf.ConfigProto()
config.gpu_options.allocator_type = 'BFC'
with tf.Session(config = config) as s:
みんなに尋ねる
興味がある人のためのエラーメッセージの残りの部分を追加:
は、事前にそんなに をありがとう、
長いコピー/貼り付けのために申し訳ありません、多分誰かがそれを見たい/必要があるだろうレオン
(gputensorflow) [email protected]:~/Tensorflow$ python Netzwerk_v0.5.1_gamma.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.40GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 1, Chunks in use: 0 147.20MiB allocated for chunks. 147.20MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 1, Chunks in use: 0 628.52MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 775.72MiB was 256.00MiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:666] Size: 628.52MiB | Requested Size: 0B | in_use: 0, prev: Size: 147.20MiB | Requested Size: 147.20MiB | in_use: 1, next: Size: 54.8KiB | Requested Size: 54.7KiB | in_use: 1
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208000000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208000500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208000600 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020800e100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020800e200 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208018f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208019000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208019100 of size 813400064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102387d1100 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102387dec00 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b11e00 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b1cb00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b1cc00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10241b1cd00 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102722d4d00 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b615a00 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b620700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b620800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1027b620900 of size 813400064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102abdd8900 of size 813400064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102dc590900 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102dc59e400 of size 56064
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102dc5abf00 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102e58df100 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102eec12300 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102eec1d000 of size 44288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102eec27d00 of size 1612612352
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1024ae4ff00 of size 659049984
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x102722e2800 of size 154350080
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 8 Chunks of size 256 totalling 2.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 44288 totalling 216.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 56064 totalling 273.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 154350080 totalling 588.80MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 813400064 totalling 2.27GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1612612352 totalling 1.50GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 4.35GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5484118016
InUse: 4670717952
MaxInUse: 5484118016
NumAllocs: 29
MaxAllocSize: 1612612352
W tensorflow/core/common_runtime/bfc_allocator.cc:274] *********************___________*__***************************************************xxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 775.72MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[14525,14000]
Traceback (most recent call last):
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
return fn(*args)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
status, run_metadata)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[14525,14000]
[[Node: rnn/basic_lstm_cell/weights/Initializer/random_uniform = Add[T=DT_FLOAT, _class=["loc:@rnn/basic_lstm_cell/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/basic_lstm_cell/weights/Initializer/random_uniform/mul, rnn/basic_lstm_cell/weights/Initializer/random_uniform/min)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Netzwerk_v0.5.1_gamma.py", line 171, in <module>
session.run(tf.global_variables_initializer())
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[14525,14000]
[[Node: rnn/basic_lstm_cell/weights/Initializer/random_uniform = Add[T=DT_FLOAT, _class=["loc:@rnn/basic_lstm_cell/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/basic_lstm_cell/weights/Initializer/random_uniform/mul, rnn/basic_lstm_cell/weights/Initializer/random_uniform/min)]]
Caused by op 'rnn/basic_lstm_cell/weights/Initializer/random_uniform', defined at:
File "Netzwerk_v0.5.1_gamma.py", line 94, in <module>
initial_state=initial_state, time_major=False) # time_major = FALSE currently
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 545, in dynamic_rnn
dtype=dtype)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 712, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2626, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2459, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2409, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 697, in _time_step
(output, new_state) = call_cell()
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 683, in <lambda>
call_cell = lambda: cell(input_t, state)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 179, in __call__
concat = _linear([inputs, h], 4 * self._num_units, True, scope=scope)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 747, in _linear
"weights", [total_arg_size, output_size], dtype=dtype)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 988, in get_variable
custom_getter=custom_getter)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 890, in get_variable
custom_getter=custom_getter)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 348, in get_variable
validate_shape=validate_shape)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 333, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 684, in _get_single_variable
validate_shape=validate_shape)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 226, in __init__
expected_shape=expected_shape)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 303, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 673, in <lambda>
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/init_ops.py", line 360, in __call__
dtype, seed=self.seed)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/random_ops.py", line 246, in random_uniform
return math_ops.add(rnd * (maxval - minval), minval, name=name)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 73, in add
result = _op_def_lib.apply_op("Add", x=x, y=y, name=name)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/leon/anaconda3/envs/gputensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[14525,14000]
[[Node: rnn/basic_lstm_cell/weights/Initializer/random_uniform = Add[T=DT_FLOAT, _class=["loc:@rnn/basic_lstm_cell/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/basic_lstm_cell/weights/Initializer/random_uniform/mul, rnn/basic_lstm_cell/weights/Initializer/random_uniform/min)]]
私は最近、この問題に遭遇し、トレーニングの途中でリソースの枯渇問題に直面しました。そして私はこのhttps://github.com/tensorflow/tensorflow/issues/4735に続き、検証バッチサイズを減らすことでこの問題に取り組んだ。 – RyanLiu