非常に低いGPU使用

私は、分類タスクのための非常に長いシーケンスを処理するために単一のdynamic_rnnを使用しようとします。 rnn_size = 500、seq_max_length = 2500、batch_size = 50、embedding_size = 64、softmax_size = 1600のようなパラメータがあります。非常に低いGPU使用

コードは以下の通りである：

x_vec = tf.nn.embedding_lookup(embedding_matrix_variable, self.x) 
lstm_fw_cell = rnn_cell.LSTMCell(num_units = hidden_unit, input_size = word_dim) 
lstm_fw_cell = rnn_cell.DropoutWrapper(lstm_fw_cell, output_keep_prob=self.dropout_keep_prob, input_keep_prob=self.dropout_keep_prob) 
outputs, _ = rnn.dynamic_rnn(lstm_fw_cell, x, dtype=tf.float32, sequence_length=real_length, swap_memory=False) 

outputs = tf.transpose(outputs, [1, 0, 2]) 
outputs = tf.unpack(outputs) 

output = outputs[0] 
one = tf.ones([1, hidden_unit], tf.float32) 
with tf.variable_scope("output"): 
    tf.get_variable_scope().reuse_variables() 
     for i in range(1, len(outputs_6)): 
      ind = self.real_length < (i+1) 
      ind = tf.to_float(ind) 
      ind = tf.expand_dims(ind, -1) 
      mat = tf.matmul(ind, one) 
      output=tf.add(tf.mul(output, mat), tf.mul(outputs[i], 1.0-mat)) 


y_prediction = tf.matmul(output, w_h2y) + b_h2y 
y_prediction = tf.nn.softmax(y_prediction) 

weight_decay = L2 * (tf.nn.l2_loss(w_h2y) + tf.nn.l2_loss(b_h2y)) 
self.cost = tf.reduce_mean(-tf.reduce_sum(self.y*tf.log(y_prediction + 1e-10))) + weight_decay 
self.optimizer = tf.train.AdamOptimizer(alpha).minimize(self.cost)

TITAN上のGPUの使用はわずか5％です。 CPUの使用率は約150％です。何が問題なのかよく分かりません。

出典

2016-12-16 Nils Cao

おそらくボトルネックがどこかにあります。ボトルネックを見つける方法の1つがタイムラインプロファイリングです - https://github.com/tensorflow/tensorflow/issues/1824#issuecomment-225754659 –

Yaroslav氏によると、これはあなたのコード（または問題を認識するのに十分な幸運な人）のプロファイリングが必要なので、これは難しい質問です。新しいTensorFlow Performanceページのように、This comment on the github issuesはプロファイリングの出発点です。

出典

2017-11-18 15:27:27 dga

答えて

関連する問題