TensorFlow LSTM：バッチ処理と状態についての混乱

公式のTensorFlow RNN tutorialとfull codeを見ると、データがどのようにエポックに分割されているか少し混乱しています。まず、run_epoch()関数での状態変数の使用を理解していません。 main()機能では、エポック以上のループでは、我々が呼ぶ：TensorFlow LSTM：バッチ処理と状態についての混乱

def run_epoch(session, model, eval_op=None, verbose=False): 
    """Runs the model on the given data.""" 
    start_time = time.time() 
    costs = 0.0 
    iters = 0 
    state = session.run(model.initial_state) 

    fetches = { 
     "cost": model.cost, 
     "final_state": model.final_state, 
    } 
    if eval_op is not None: 
    fetches["eval_op"] = eval_op 

    for step in range(model.input.epoch_size): 
    feed_dict = {} 
    for i, (c, h) in enumerate(model.initial_state): 
     feed_dict[c] = state[i].c 
     feed_dict[h] = state[i].h 

    vals = session.run(fetches, feed_dict) 
    cost = vals["cost"] 
    state = vals["final_state"] 

    costs += cost 
    iters += model.input.num_steps 

    if verbose and step % (model.input.epoch_size // 10) == 10: 
     print("%.3f perplexity: %.3f speed: %.0f wps" % 
      (step * 1.0/model.input.epoch_size, np.exp(costs/iters), 
      iters * model.input.batch_size/(time.time() - start_time))) 

    return np.exp(costs/iters)

state変数となぜ我々はすべてのステップでmodel.initial_stateを列挙し、上書きされているとは何ですか？

はまた、reader.pyファイルを見て、次はデータを分割：

def ptb_producer(raw_data, batch_size, num_steps, name=None): 
    """Iterate on the raw PTB data. 

    This chunks up raw_data into batches of examples and returns Tensors that 
    are drawn from these batches. 

    Args: 
    raw_data: one of the raw data outputs from ptb_raw_data. 
    batch_size: int, the batch size. 
    num_steps: int, the number of unrolls. 
    name: the name of this operation (optional). 

    Returns: 
    A pair of Tensors, each shaped [batch_size, num_steps]. The second element 
    of the tuple is the same data time-shifted to the right by one. 

    Raises: 
    tf.errors.InvalidArgumentError: if batch_size or num_steps are too high. 
    """ 
    with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]): 
    raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32) 

    data_len = tf.size(raw_data) 
    batch_len = data_len // batch_size 
    data = tf.reshape(raw_data[0 : batch_size * batch_len], 
         [batch_size, batch_len]) 

    epoch_size = (batch_len - 1) // num_steps 
    assertion = tf.assert_positive(
     epoch_size, 
     message="epoch_size == 0, decrease batch_size or num_steps") 
    with tf.control_dependencies([assertion]): 
     epoch_size = tf.identity(epoch_size, name="epoch_size") 

    i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue() 
    x = tf.slice(data, [0, i * num_steps], [batch_size, num_steps]) 
    y = tf.slice(data, [0, i * num_steps + 1], [batch_size, num_steps]) 
    return x, y

は、なぜ我々はバッチと手順でデータを分割して2を混ぜていますか？それはちょっと混乱します。なぜ、バッチまたは単なるステップでデータセット全体を反復処理しないのですか？

出典

2016-11-11 vega

2つは全く異なります。 num_stepsパラメータは、バックプロパゲーションが発生するまでの入力数を制御します。バッチでデータを分割することで、より大きなデータを一度に処理することで効率的な実装が可能になります。バッチ処理された入力を直接見ると、ちょっと混乱します。 batch_sizeを1に設定し、入力がどのように見えるかを確認します。これは単にnum_stepsを使用するようなものです。 RNNのトレーニングには時間がかかります。

出典

2017-02-10 23:06:42

TensorFlow LSTM：バッチ処理と状態についての混乱

答えて

関連する問題