Tensorflowはminimize（）で使用する微分関数をどのように知っていますか？

model9 = tf.nn.relu(tf.matmul(x1,w9)+b) 
model10 = tf.nn.sigmoid(tf.matmul(model9,w10)+b) 

error = tf.reduce_mean(tf.square(model10-y)) 
train = tf.train.AdamOptimizer(learning_rate=0.001).minimize(error)

とてもスマート、それはすべての層を通って「反復」だろうとactivaction機能をチェックして、アクティベーション機能誘導体に基づく勾配まともに適用本当にtensorflowですか？Tensorflowはminimize（）で使用する微分関数をどのように知っていますか？
モデル9の学習率が0.01になると思いますか？どのようにネットワーク上でそれを設定するのですか？

出典

2017-09-25 Testing man

はい、あなたのコードは、操作と変数を表すノードからなるTensorFlow計算グラフを構築しています。 TensorFlowは各演算の勾配（すなわち、各入力に対する演算の出力の勾配）を知っているので、バックプロパゲーションアルゴリズムを使用して勾配降下中に変数を更新し、各活性化関数の正しい導関数を道。 http://cs224d.stanford.edu/lecture_notes/notes3.pdf

レイヤごとに異なる学習率を使用することについては、それほど単純ではありませんが、最小化呼び出しをcompute_gradientsとapply_gradientsの2つの部分に分割し、次のようにすることができます。効果的に学習率を変更するためにグラデーションを変更してください。このようなもの：

model9 = tf.nn.relu(tf.matmul(x1,w9)+b) 
model10 = tf.nn.sigmoid(tf.matmul(model9,w10)+b) 
error = tf.reduce_mean(tf.square(model10-y)) 

optimiser = tf.train.AdamOptimizer(learning_rate=0.001)  
gradients = optimiser.compute_gradients(error, [w9, w10]) # Compute the gradients of error with respect to w9 and w10 

# gradients is a list of tuples [(gradient, variable)] 
gradients[0][0] *= 10 # Multiply the gradient of w9 by 10 to increase the learning rate 
train = optimiser.apply_gradients(gradients) # New train op

出典

2017-09-25 17:08:30

ありがとう、私の友人。 –

注：アダムは学習率を動的に調整するために、各変数の統計を追跡しているので、トレーニングの早期を除き、この手法はそれほど効果的ではないかもしれません。 –

とてもスマート、それはすべての層を通って「反復」だろうとactivaction機能をチェックして、アクティベーション機能誘導体に基づく勾配まともに適用本当にtensorflowですか？

はい。これがTensorflowの使用の全ポイントです。

出典

2017-09-25 16:52:22 Aaron

Tensorflowはminimize（）で使用する微分関数をどのように知っていますか？

答えて

関連する問題