tf.layers.batch_normalization大きなテストエラー

バッチ正規化を使用しようとしています。私はmnistのために単純なconv netにtf.layers.batch_normalizationを使用しようとしました。tf.layers.batch_normalization大きなテストエラー

列車ステップ（> 98％）では高い精度が得られますが、テスト精度は非常に低くなります（< 50％）。私は運動量の値を変更しようとしました（私は0.8,0.9,0.99,0.999を試しました）、バッチサイズで遊んでいましたが、常に同じように動作します。私は20kの反復でそれを練習します。

私のコード

# Input placeholders 
x = tf.placeholder(tf.float32, [None, 784], name='x-input') 
y_ = tf.placeholder(tf.float32, [None, 10], name='y-input') 
is_training = tf.placeholder(tf.bool) 

# inut layer 
input_layer = tf.reshape(x, [-1, 28, 28, 1]) 
with tf.name_scope('conv1'): 
    #Convlution #1 ([5,5] : [28x28x1]->[28x28x6]) 
    conv1 = tf.layers.conv2d(
     inputs=input_layer, 
     filters=6, 
     kernel_size=[5, 5], 
     padding="same", 
     activation=None 
    ) 

    #Batch Norm #1 
    conv1_bn = tf.layers.batch_normalization(
     inputs=conv1, 
     axis=-1, 
     momentum=0.9, 
     epsilon=0.001, 
     center=True, 
     scale=True, 
     training = is_training, 
     name='conv1_bn' 
    ) 

    #apply relu 
    conv1_bn_relu = tf.nn.relu(conv1_bn) 
    #apply pool ([2,2] : [28x28x6]->[14X14X6]) 
    maxpool1=tf.layers.max_pooling2d(
     inputs=conv1_bn_relu, 
     pool_size=[2,2], 
     strides=2, 
     padding="valid" 
    ) 

with tf.name_scope('conv2'): 
    #convolution #2 ([5x5] : [14x14x6]->[14x14x16] 
    conv2 = tf.layers.conv2d(
     inputs=maxpool1, 
     filters=16, 
     kernel_size=[5, 5], 
     padding="same", 
     activation=None 
    ) 

    #Batch Norm #2 
    conv2_bn = tf.layers.batch_normalization(
     inputs=conv2, 
     axis=-1, 
     momentum=0.999, 
     epsilon=0.001, 
     center=True, 
     scale=True, 
     training = is_training 
    ) 

    #apply relu 
    conv2_bn_relu = tf.nn.relu(conv2_bn) 
    #maxpool2 ([2,2] : [14x14x16]->[7x7x16] 
    maxpool2=tf.layers.max_pooling2d(
     inputs=conv2_bn_relu, 
     pool_size=[2,2], 
     strides=2, 
     padding="valid" 
    ) 

#fully connected 1 [7*7*16 = 784 -> 120] 
maxpool2_flat=tf.reshape(maxpool2,[-1,7*7*16]) 
fc1 = tf.layers.dense(
    inputs=maxpool2_flat, 
    units=120, 
    activation=None 
) 

#Batch Norm #2 
fc1_bn = tf.layers.batch_normalization(
    inputs=fc1, 
    axis=-1, 
    momentum=0.999, 
    epsilon=0.001, 
    center=True, 
    scale=True, 
    training = is_training 
) 
#apply reliu 

fc1_bn_relu = tf.nn.relu(fc1_bn) 

#fully connected 2 [120-> 84] 
fc2 = tf.layers.dense(
    inputs=fc1_bn_relu, 
    units=84, 
    activation=None 
) 

#apply relu 
fc2_bn_relu = tf.nn.relu(fc2) 

#fully connected 3 [84->10]. Output layer with softmax 
y = tf.layers.dense(
    inputs=fc2_bn_relu, 
    units=10, 
    activation=None 
) 

#loss 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)) 
tf.summary.scalar('cross entropy', cross_entropy) 

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 
tf.summary.scalar('accuracy',accuracy) 

#merge summaries and init train writer 
sess = tf.Session() 
merged = tf.summary.merge_all() 
train_writer = tf.summary.FileWriter(log_dir + '/train' ,sess.graph) 
test_writer = tf.summary.FileWriter(log_dir + '/test') 
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) 
init = tf.global_variables_initializer() 
sess.run(init) 

with sess.as_default(): 
    def get_variables_values(): 
     variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) 
     values = {} 
     for variable in variables: 
      values[variable.name[:-2]] = sess.run(variable, feed_dict={ 
       x:batch[0], y_:batch[1], is_training:True 
       }) 
     return values 


    for i in range(t_iter): 
     batch = mnist.train.next_batch(batch_size) 
     if i%100 == 0: #test-set summary 
      print('####################################') 
      values = get_variables_values() 
      print('moving variance is:') 
      print(values["conv1_bn/moving_variance"]) 
      print('moving mean is:') 
      print(values["conv1_bn/moving_mean"]) 
      print('gamma is:') 
      print(values["conv1_bn/gamma/Adam"]) 
      print('beta is:') 
      print(values["conv1_bn/beta/Adam"]) 
      summary, acc = sess.run([merged,accuracy], feed_dict={ 
       x:mnist.test.images, y_:mnist.test.labels, is_training:False 

      }) 

     else: 
      summary, _ = sess.run([merged,train_step], feed_dict={ 
       x:batch[0], y_:batch[1], is_training:True 
      }) 
      if i%10 == 0: 
       train_writer.add_summary(summary,i)

は、私はこの問題はmoving_mean/varが更新されていないことということだと思います。実行中にmoving_mean/varを出力します。移動分散が [1. 1. 1. 1. 1. 1.] 移動平均は： [0. 0. 0. 0 。0] ガンマは次のとおりです。 [-0.00055969 0.00164391 0.00163301 -0.00206227 -0.00011434 -0.00070161] ベータ版は、次のとおりです。 [-0.00232835 -0.00040769 0.00114277 -0.0025414 -0.00049697 0.00221556]

誰もが私は何の任意のアイデアを持っています間違っている？

出典

2017-04-05 MrG

こんにちは、MRG、あなたは示すことができました私はあなたのテストコードですか？私はあなたと同じ問題があり、常にtf.layers.batch_normalizationを使って定数を予測します。 – Yang

tf.layers.batch_normalizationが更新平均と分散に加算する操作は、列車操作の依存として自動的には追加されません。余分な操作を行わなければ、決して実行されません。（残念ながら、このドキュメントには現在言及していませんが、問題については開封しています）

幸いにも、更新操作はtf.GraphKeys.UPDATE_OPSコレクションに追加されているので、簡単に取得できます。

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
sess.run([train_op, extra_update_ops], ...)

またはあなたのトレーニング動作の依存関係として追加し、その後、ちょうど通常通りトレーニング操作を実行します：その後、手動で余分な操作を実行することができますいずれか

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
with tf.control_dependencies(extra_update_ops): 
    train_op = optimizer.minimize(loss) 
... 
sess.run([train_op], ...)

出典

2017-04-07 10:56:45

ありがとう！今すぐ働きます – MrG

あなたのお手伝いをしてくれてありがとうございました。バッチノルムが依然として貢献していたときの同様のアプローチを詳しく説明した記事を見ました - tf.layersに移行したときに修正されたと誤解しました平均と分散のデフォルトの振る舞いを更新しないでしょうか？ – Prophecies

私は同意する、それは少し不便だ。私はそれがサマリーオペレーションと同様の状況である可能性があると考えています。グラフを通って損失関数に至るデータフローは、単にこれらのオペレーションに依存しないため、別々に呼び出さなければなりません。 –

tf.layers.batch_normalization大きなテストエラー

答えて

関連する問題