TensorflowでのGPUオペレーションの追加

thisドキュメントに続いて、TensorFlowに新しいオペレーションを緩やかに追加しようとしています。違いは、GPUベースのオペレーションを実装しようとしていることです。私が追加しようとしているのは、here（cuda_op.py、cuda_op_kernel.cc、cuda_op_kernel.cu.cc）のcuda opです。 I tensorflowとでそれらを引くために使用tf.load_op_libraryのこれらの外をコンパイルしようとしています私はここにいくつかの変更を行っている私のファイルは、次のとおりです。TensorflowでのGPUオペレーションの追加

cuda_op_kernel.cc

#include "tensorflow/core/framework/op.h" 
#include "tensorflow/core/framework/shape_inference.h" 
#include "tensorflow/core/framework/op_kernel.h" 

using namespace tensorflow; // NOLINT(build/namespaces) 

REGISTER_OP("AddOne") 
    .Input("input: int32") 
    .Output("output: int32") 
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) { 
     c->set_output(0, c->input(0)); 
     return Status::OK(); 
    }); 

void AddOneKernelLauncher(const int* in, const int N, int* out); 

class AddOneOp : public OpKernel { 
public: 
    explicit AddOneOp(OpKernelConstruction* context) : OpKernel(context) {} 

    void Compute(OpKernelContext* context) override { 
    // Grab the input tensor 
    const Tensor& input_tensor = context->input(0); 
    auto input = input_tensor.flat<int32>(); 

    // Create an output tensor 
    Tensor* output_tensor = NULL; 
    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(), 
                &output_tensor)); 
    auto output = output_tensor->template flat<int32>(); 

    // Set all but the first element of the output tensor to 0. 
    const int N = input.size(); 
    // Call the cuda kernel launcher 
    AddOneKernelLauncher(input.data(), N, output.data()); 

    } 
}; 

REGISTER_KERNEL_BUILDER(Name("AddOne").Device(DEVICE_GPU), AddOneOp);

cuda_op_kernel.cu

#define EIGEN_USE_GPU 
#include <cuda.h> 
#include <stdio.h> 

__global__ void AddOneKernel(const int* in, const int N, int* out) { 
    for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N; 
     i += blockDim.x * gridDim.x) { 
    out[i] = in[i] + 1; 
    } 
} 

void AddOneKernelLauncher(const int* in, const int N, int* out) { 
    AddOneKernel<<<32, 256>>>(in, N, out); 

    cudaError_t cudaerr = cudaDeviceSynchronize(); 
    if (cudaerr != cudaSuccess) 
    printf("kernel launch failed with error \"%s\".\n", cudaGetErrorString(cudaerr)); 
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.5) 

#found from running python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())' 
include_directories(/usr/local/lib/python3.5/dist-packages/tensorflow/include) 

find_package(CUDA) 

#set flags based on tutorial 
set (CMAKE_CXX_FLAGS "--std=c++11 -fPIC -O2 -D_GLIBCXX_USE_CXX11_ABI=0") 

#pass flags to c++ compiler 
SET(CUDA_PROPAGATE_HOST_FLAGS ON) 

#create library 
cuda_add_library(
    cuda_op SHARED 
    src/cuda_op_kernel.cu 
    src/cuda_op_kernel.cc 
    OPTIONS -gencode=arch=compute_20,code=sm_20) 

#copy test file to build folder 
configure_file(src/test.py test.py COPYONLY)

test.py

import tensorflow as tf 
mod = tf.load_op_library('./libcuda_op.so') 
with tf.Session() as sess: 
    start = [5,4,3,2,1] 
    print(start) 
    print(mod.add_one(start).eval())

test.pyをコンパイルして正常に実行できますが、出力は常に[0 0 0 0 0]です。 AddOneKernel<<<32, 256>>>(in, N, out);をfor (int i = 0; i < N; i++) out[i] = in[i] + 1;と置き換え、DEVICE_GPUをDEVICE_CPUに置き換えた場合、演算子は正しい値[6 5 4 3 2]（正確にはCMakeList.txt）を出力します。

返される正しい値を取得する方法はありますか？

出典

2017-06-07 McAngus

私はCUDAのためにcmakeのものをどこに見つけたのかは完全には思い出せませんが、オプションは何とかコンパイルを妨害していました。 CMakeLists.txtのcuda_add_libraryを次のように置き換えて問題を解決しました。

#no options needed 
cuda_add_library(
    cuda_op SHARED 
    src/cuda_op_kernel.cu 
    src/cuda_op_kernel.cc)

出典

2017-06-07 16:56:15 McAngus

Ubuntuの@のcubuntu：〜/デスクトップ/ SRC/SRC /ビルドの$ cmakeの..

は - なさ生成

- -

を行って設定ファイルを作成してきました

を作る〜/デスクトップ/ SRC/SRC /構築$：/ホーム/ Ubuntuの/デスクトップ/ SRC/SRC/

Ubuntuの@のcubuntuを構築：に書き込ま

[33％]ビルNVCC（デバイス）オブジェクトCMakeFiles/cuda_op.d/cuda_op_generated_cuda_op_kernel.cu.o

NVCC警告： 'compute_20'、 'sm_20'、及び 'sm_21' アーキテクチャは廃止されていてもよく将来のリリースでは削除されました（警告を抑制するには-Wno-deprecated-gpu-targetsを使用してください）。

nvcc警告： 'compute_20'、 'sm_20'、および 'sm_21'のアーキテクチャは廃止され、今後のリリースでは削除される可能性があります（警告を抑制するには-Wno-deprecated-gpu-targetsを使用してください）。ターゲットcuda_opの

スキャン依存関係

[66％]ビルCXXオブジェクトCMakeFiles/cuda_op.dir/cuda_op_kernel.cc.o /ホーム/ Ubuntuの/デスクトップ/ SRC/SRC/cuda_op_kernel。cc：1：17：エラー： 'tensorflow'は名前空間名ではありません using namespace tensorflow; // NOLINT（ビルド/ネームスペース）

出典

2017-09-26 16:05:30 Essa

私はあなたの例に従おうとしていましたが、makeを実行すると、テンソルフロー名前空間について文句を言うのですか？何が問題なの？ – Essa

少し遅れて返信しますが、このために新しい質問を作成する必要があります。 – McAngus

TensorflowでのGPUオペレーションの追加

答えて

関連する問題