後続のミニバッチでRNNの初期状態がリセットされていますか？

TFのRNNの初期状態が後続のミニバッチでリセットされるか、以前のミニバッチの最後の状態がIlya Sutskever et al., ICLR 2015で説明されているように使用されているかどうかを明確にしてください。後続のミニバッチでRNNの初期状態がリセットされていますか？

出典

2016-07-18 VM_AI

tf.nn.dynamic_rnn()またはtf.nn.rnn()操作では、initial_stateパラメータを使用してRNNの初期状態を指定できます。このパラメータを指定しないと、各トレーニングバッチの開始時に隠れた状態がゼロベクトルに初期化されます。

TensorFlowでは、tf.Variable()にテンソルをラップして、複数のセッション実行間でグラフの値を保持できます。オプティマイザがデフォルトですべての学習可能な変数を調整するので、それらを訓練不可能なものとしてマークするようにしてください。

data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size)) 

cell = tf.nn.rnn_cell.GRUCell(256) 
state = tf.Variable(cell.zero_states(batch_size, tf.float32), trainable=False) 
output, new_state = tf.nn.dynamic_rnn(cell, data, initial_state=state) 

with tf.control_dependencies([state.assign(new_state)]): 
    output = tf.identity(output) 

sess = tf.Session() 
sess.run(tf.initialize_all_variables()) 
sess.run(output, {data: ...})

私はこのコードをテストしませんでしたが、正しい方向にヒントを与えるはずです。状態保存オブジェクトを提供できるtf.nn.state_saving_rnn()もありますが、まだ使用していません。

出典

2016-07-19 18:11:12 danijar

danijarの回答に加えて、状態がタプル（state_is_tuple=True）であるLSTMのコードがあります。また、複数のレイヤーをサポートします。

初期ゼロ状態の状態変数を取得する関数と、操作を返す関数の2つの関数を定義します。これは、LSTMの最後の非表示状態で状態変数を更新するためにsession.runに渡すことができます。同様の

def get_state_variables(batch_size, cell): 
    # For each layer, get the initial state and make a variable out of it 
    # to enable updating its value. 
    state_variables = [] 
    for state_c, state_h in cell.zero_state(batch_size, tf.float32): 
     state_variables.append(tf.contrib.rnn.LSTMStateTuple(
      tf.Variable(state_c, trainable=False), 
      tf.Variable(state_h, trainable=False))) 
    # Return as a tuple, so that it can be fed to dynamic_rnn as an initial state 
    return tuple(state_variables) 


def get_state_update_op(state_variables, new_states): 
    # Add an operation to update the train states with the last state tensors 
    update_ops = [] 
    for state_variable, new_state in zip(state_variables, new_states): 
     # Assign the new state to the state variables on this layer 
     update_ops.extend([state_variable[0].assign(new_state[0]), 
          state_variable[1].assign(new_state[1])]) 
    # Return a tuple in order to combine all update_ops into a single operation. 
    # The tuple's actual value should not be used. 
    return tf.tuple(update_ops)

はのdanijarする答えは、我々は、各バッチの後LSTMの状態を更新するためにそれを使用することができます

data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size)) 
cell_layer = tf.contrib.rnn.GRUCell(256) 
cell = tf.contrib.rnn.MultiRNNCell([cell_layer] * num_layers) 

# For each layer, get the initial state. states will be a tuple of LSTMStateTuples. 
states = get_state_variables(batch_size, cell) 

# Unroll the LSTM 
outputs, new_states = tf.nn.dynamic_rnn(cell, data, initial_state=states) 

# Add an operation to update the train states with the last state tensors. 
update_op = get_state_update_op(states, new_states) 

sess = tf.Session() 
sess.run(tf.global_variables_initializer()) 
sess.run([outputs, update_op], {data: ...})

主な違いは、state_is_tuple=TrueがLSTMの状態2つの変数を含むLSTMStateTuple（セル状態になるということですと隠れた状態）を作成します。複数のレイヤーを使用すると、LSTMの状態はレイヤーごとに1つのLSTMStateTuplesのタプルになります。

出典

2016-12-20 10:20:44

あなたがあなたがおそらくやりたいことではないnum_layers _identical_セルを作成する方法に注意してください –

後続のミニバッチでRNNの初期状態がリセットされていますか？

答えて

関連する問題