Tensorflowオブジェクト検出リソースの枯渇OOM例外

私はtensorflowオブジェクト検出apisを試しています。fast_rcnn_resnet101の微調整に成功し、推論ファイルをエクスポートできました。Tensorflowオブジェクト検出リソースの枯渇OOM例外

私のモデルを使用して画像を検出しようとすると、以下のエラーが表示されます。

--------------------------------------------------------------------------- 
ResourceExhaustedError     Traceback (most recent call last) 
<ipython-input-19-ec0c1510b78c> in <module>() 
    30  (boxes, scores, classes, num_detections) = sess.run(
    31   [boxes, scores, classes, num_detections], 
---> 32   feed_dict={image_tensor: image_np_expanded}) 
    33  print(" classes = "+str(classes)+" scores = "+str(scores)+" num# = "+str(num_detections)) 
    34  print ("vizualizing boxes") 

/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata) 
    787  try: 
    788  result = self._run(None, fetches, feed_dict, options_ptr, 
--> 789       run_metadata_ptr) 
    790  if run_metadata: 
    791   proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) 

/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata) 
    995  if final_fetches or final_targets: 
    996  results = self._do_run(handle, final_targets, final_fetches, 
--> 997        feed_dict_string, options, run_metadata) 
    998  else: 
    999  results = [] 

/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata) 
    1130  if handle is None: 
    1131  return self._do_call(_run_fn, self._session, feed_dict, fetch_list, 
-> 1132       target_list, options, run_metadata) 
    1133  else: 
    1134  return self._do_call(_prun_fn, self._session, handle, feed_dict, 

/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args) 
    1150   except KeyError: 
    1151   pass 
-> 1152  raise type(e)(node_def, op, message) 
    1153 
    1154 def _extend_graph(self): 

ResourceExhaustedError: OOM when allocating tensor with shape[300,14,14,1024] 
    [[Node: CropAndResize = CropAndResize[T=DT_FLOAT, extrapolation_value=0, method="bilinear", _device="/job:localhost/replica:0/task:0/gpu:0"](FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_23/bottleneck_v1/Relu, Reshape_7, Reshape_8/_95, CropAndResize/crop_size)]] 
    [[Node: SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Concatenate/concat_3/_151 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2971_SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Concatenate/concat_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](^_cloopSecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/strided_slice/stack_2/_6)]] 

Caused by op u'CropAndResize', defined at: 
    File "/home/ubuntu/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main 
    "__main__", fname, loader, pkg_name) 
    File "/home/ubuntu/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code 
    exec code in run_globals 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py", line 3, in <module> 
    app.launch_new_instance() 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 653, in launch_instance 
    app.start() 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 474, in start 
    ioloop.IOLoop.instance().start() 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 162, in start 
    super(ZMQIOLoop, self).start() 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 887, in start 
    handler_func(fd_obj, events) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper 
    return fn(*args, **kwargs) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events 
    self._handle_recv() 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv 
    self._run_callback(callback, msg) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback 
    callback(*args, **kwargs) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper 
    return fn(*args, **kwargs) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher 
    return self.dispatch_shell(stream, msg) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell 
    handler(stream, idents, msg) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 390, in execute_request 
    user_expressions, allow_stdin) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute 
    res = shell.run_cell(code, store_history=store_history, silent=silent) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 501, in run_cell 
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell 
    interactivity=interactivity, compiler=compiler, result=result) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes 
    if self.run_code(code, result): 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code 
    exec(code_obj, self.user_global_ns, self.user_ns) 
    File "<ipython-input-5-57652895f483>", line 7, in <module> 
    tf.import_graph_def(od_graph_def, name='') 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 311, in import_graph_def 
    op_def=op_def) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__ 
    self._traceback = _extract_stack() 

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[300,14,14,1024] 
    [[Node: CropAndResize = CropAndResize[T=DT_FLOAT, extrapolation_value=0, method="bilinear", _device="/job:localhost/replica:0/task:0/gpu:0"](FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_23/bottleneck_v1/Relu, Reshape_7, Reshape_8/_95, CropAndResize/crop_size)]] 
    [[Node: SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Concatenate/concat_3/_151 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2971_SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Concatenate/concat_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](^_cloopSecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/strided_slice/stack_2/_6)]]

以下は、test_imagesから画像を収集し、それを実行しようとする私のtfコードです。私は高性能なaws gpuサーバを使用しているので、そのメモリの問題は考えません。また、私のテスト画像のサイズは400x400です。

tfconfig = tf.ConfigProto() 
tfconfig.gpu_options.allocator_type = 'BFC' 
tfconfig.gpu_options.per_process_gpu_memory_fraction = 0.40 
tfconfig.gpu_options.allow_growth=True 
with detection_graph.as_default(): 
    with tf.Session(graph=detection_graph,config=tfconfig) as sess: 
    for image_path in TEST_IMAGE_PATHS: 
     image = Image.open(image_path) 
     # the array based representation of the image will be used later in order to prepare the 
     # result image with boxes and labels on it. 
     print("about to convert image ="+image_path+" into np") 
     image_np = load_image_into_numpy_array(image) 
     # Expand dimensions since the model expects images to have shape: [1, None, None, 3] 
     print ("expanding np ...")  
     image_np_expanded = np.expand_dims(image_np, axis=0) 
     print(image_np.shape) 
     print ("getting image tensor ...") 
     image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') 
     # Each box represents a part of the image where a particular object was detected. 
     print ("detecting boxes ...") 
     boxes = detection_graph.get_tensor_by_name('detection_boxes:0') 
     # Each score represent how level of confidence for each of the objects. 
     # Score is shown on the result image, together with the class label. 
     print ("getting scores, classes and num of detections ...") 
     scores = detection_graph.get_tensor_by_name('detection_scores:0')  
     classes = detection_graph.get_tensor_by_name('detection_classes:0') 
     num_detections = detection_graph.get_tensor_by_name('num_detections:0') 
     # Actual detection. 
     print ("building boxes") 
     (boxes, scores, classes, num_detections) = sess.run(
      [boxes, scores, classes, num_detections], 
      feed_dict={image_tensor: image_np_expanded})

出典

2017-12-10 mahesh madhusudanan

は、問題を見つけ、私は十分なCPUのメモリを持っていたにもかかわらず、私は

(tfpy27) [email protected]:~/Object-Detector-App/object_detection$ nvidia-smi 
Sun Dec 10 12:29:04 2017              
+-----------------------------------------------------------------------------+ 
| NVIDIA-SMI 375.66     Driver Version: 375.66     | 
|-------------------------------+----------------------+----------------------+ 
| GPU Name  Persistence-M| Bus-Id  Disp.A | Volatile Uncorr. ECC | 
| Fan Temp Perf Pwr:Usage/Cap|   Memory-Usage | GPU-Util Compute M. | 
|===============================+======================+======================| 
| 0 Tesla K80   Off | 0000:00:1E.0  Off |     0 | 
| N/A 67C P0 60W/149W | 4483MiB/11439MiB |  0%  Default | 
+-------------------------------+----------------------+----------------------+ 
+-----------------------------------------------------------------------------+ 
| Processes:              GPU Memory | 
| GPU  PID Type Process name        Usage  | 
|=============================================================================| 
| 0  8200 C /home/ubuntu/anaconda2/bin/python    4481MiB | 
+-----------------------------------------------------------------------------+ 

sudo kill -9 <PID>

まだないように注意してください、それの下で実行中のプロセスを殺すことによって、私のGPUメモリをクリアしなければならなかった（https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smiをQuoraのためのおかげで）単一の中規模の画像でセッションを実行するために非常に多くのメモリを占有していた理由

出典

2017-12-10 12:38:20

Tensorflowオブジェクト検出リソースの枯渇OOM例外

答えて

関連する問題