TensorFlow GP2で動作するWord2Vecモデル

this TensorFlow exampleスキップグラムWord2Vecモデルのトレーニングが記載されています。TensorFlow GP2で動作するWord2Vecモデル

batch_size = 128 
embedding_size = 128 # Dimension of the embedding vector. 
skip_window = 1 # How many words to consider left and right. 
num_skips = 2 # How many times to reuse an input to generate a label. 

# We pick a random validation set to sample nearest neighbors. Here we limit the 
# validation samples to the words that have a low numeric ID, which by 
# construction are also the most frequent. 
valid_size = 16 # Random set of words to evaluate similarity on. 
valid_window = 100 # Only pick dev samples in the head of the distribution. 
valid_examples = np.array(random.sample(range(valid_window), valid_size)) 
num_sampled = 64 # Number of negative examples to sample. 

graph = tf.Graph() 

with graph.as_default(), tf.device('/cpu:0'): 
    # Input data. 
    train_dataset = tf.placeholder(tf.int32, shape=[batch_size]) 
    train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1]) 
    valid_dataset = tf.constant(valid_examples, dtype=tf.int32) 

    # Variables. 
    embeddings = tf.Variable(
     tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0)) 
    softmax_weights = tf.Variable(
     tf.truncated_normal([vocabulary_size, embedding_size], 
          stddev=1.0/math.sqrt(embedding_size))) 
    softmax_biases = tf.Variable(tf.zeros([vocabulary_size])) 

    # Model. 
    # Look up embeddings for inputs. 
    embed = tf.nn.embedding_lookup(embeddings, train_dataset) 

    # Compute the softmax loss, using a sample of the negative labels each time. 
    loss = tf.reduce_mean(
     tf.nn.sampled_softmax_loss(weights=softmax_weights, 
            biases=softmax_biases, inputs=embed, 
            labels=train_labels, num_sampled=num_sampled, 
            num_classes=vocabulary_size)) 

    # Optimizer. 
    # Note: The optimizer will optimize the softmax_weights AND the embeddings. 
    # This is because the embeddings are defined as a variable quantity and the 
    # optimizer's `minimize` method will by default modify all variable quantities 
    # that contribute to the tensor it is passed. 
    # See docs on `tf.train.Optimizer.minimize()` for more details. 
    optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss) 

    # Compute the similarity between minibatch examples and all embeddings. 
    # We use the cosine distance: 
    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True)) 
    normalized_embeddings = embeddings/norm 
    valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset) 
    similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))

GPUへの切り替えをしようと、次の例外が発生します：

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable_2/Adagrad': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

が、私は理由があるのだろうかそれは明示的にすなわちtf.device('/cpu:0')計算のためのCPUデバイスを必要と次のコードが含まれてい提供されたグラフがGPUで計算できないのはなぜですか？ tf.int32タイプのために起こりますか？または別のオプティマイザに切り替える必要がありますか？言い換えれば、GPU上でWord2Vecモデルを処理する方法はありますか？（タイプキャストなし）。

UPDATE

Akshay Agrawalさんの勧告に続き、ここで必要な結果を達成し、元のコードの更新断片である：

with graph.as_default(), tf.device('/gpu:0'): 
    # Input data. 
    train_dataset = tf.placeholder(tf.int32, shape=[batch_size]) 
    train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1]) 
    valid_dataset = tf.constant(valid_examples, dtype=tf.int32) 

    embeddings = tf.Variable(
     tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0)) 
    softmax_weights = tf.Variable(
     tf.truncated_normal([vocabulary_size, embedding_size], 
          stddev=1.0/math.sqrt(embedding_size))) 
    softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))  
    embed = tf.nn.embedding_lookup(embeddings, train_dataset) 

    with tf.device('/cpu:0'): 
     loss = tf.reduce_mean(
      tf.nn.sampled_softmax_loss(weights=softmax_weights, 
             biases=softmax_biases, 
             inputs=embed, 
             labels=train_labels, 
             num_sampled=num_sampled, 
             num_classes=vocabulary_size)) 

    optimizer = tf.train.AdamOptimizer(0.001).minimize(loss) 

    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True)) 
    normalized_embeddings = embeddings/norm 
    valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset) 
    similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))

出典

2017-11-03 devforfu

AdagradOptimizerが持っていないため、エラーが発生しますその疎適用操作のためのGPUカーネル;埋め込みルックアップを区別するとスパース勾配が生じるため、疎アプリケーションがトリガされます。

GradientDescentOptimizerおよびAdamOptimizerは、スパース適用操作をサポートしますか。これらのオプティマイザのいずれかに切り替えると、残念ながら別のエラーが表示されます。tf.nn.sampled_softmax_lossは、GPUカーネルを持たないオペレーションを作成するように見えます。そのためにはloss = tf.reduce_mean(...行をwith tf.device('/cpu:0'):という文脈で囲むことができますが、そうすることでcpu-gpuの通信オーバーヘッドが導入されます。

出典

2017-11-22 18:55:44

よろしくお願いします。ですから、基本的には、スパース操作をサポートするオプティマイザに切り替えてください。 – devforfu

はい、そのトリックを行う必要があります。 –

TensorFlow GP2で動作するWord2Vecモデル

答えて

関連する問題