推論を使用して奇妙な実験:: pinned_allocatorのuda

私は現在、私のコードから面倒なcudaMallocHost/cudaFreeHostの一部を削除しようとしています。そうするために、私はstd :: vectorのみを使いたいと思っていますが、基本的なメモリは、固定されたcudaメモリ型でなければならないことが絶対に必要です。推論を使用して奇妙な実験:: pinned_allocatorのuda

しかし、私はスラストライブラリからthrust::system::cuda::experimental::pinned_allocator<>を使用して奇妙な行動に直面しています：

私のセットアップでは、与え、

//STL 
#include <iostream> 
#include <string> 
#include <vector> 
#include <algorithm> 

//CUDA 
#include <cuda_runtime.h> 
#include <thrust/device_vector.h> 
#include <thrust/transform.h> 
#include <thrust/system/cuda/experimental/pinned_allocator.h> 

#define SIZE 4 
#define INITVAL 2 
#define ENDVAL 4 

//Compile using nvcc ./main.cu -o test -std=c++11 
int main(int argc, char* argv[]) 
{ 
    // init host 
    std::vector<float,thrust::system::cuda::experimental::pinned_allocator<float> > hostVec(SIZE); 
    std::fill(hostVec.begin(),hostVec.end(),INITVAL); 

    //Init device 
    thrust::device_vector<float> thrustVec(hostVec.size()); 

    //Copy 
    thrust::copy(hostVec.begin(), hostVec.end(), thrustVec.begin()); 

    //std::cout << "Dereferencing values of the device, values should be "<< INITVAL << std::endl; 
    std::for_each(thrustVec.begin(),thrustVec.end(),[](float in){ std::cout <<"val is "<<in<<std::endl;}); 
    std::cout << "------------------------" << std::endl; 

    //Do Stuff 
    thrust::transform(thrustVec.begin(), thrustVec.end(), thrust::make_constant_iterator(2), thrustVec.begin(), thrust::multiplies<float>()); 

    //std::cout << "Dereferencing values of the device, values should now be "<< ENDVAL << std::endl; 
    std::for_each(thrustVec.begin(),thrustVec.end(),[](float in){ std::cout <<"val is "<<in<<std::endl;}); 
    std::cout << "------------------------" << std::endl; 

    //Copy back 
    thrust::copy(thrustVec.begin(), thrustVec.end(), hostVec.begin()); 

    //Synchronize 
    //cudaDeviceSynchronize(); //makes the weird behaviour to go away 

    //Check result 
    //std::cout << "Dereferencing values on the host, values should now be "<< ENDVAL << std::endl;//Also makes the weird behaviour to go away 

    std::for_each(hostVec.begin(),hostVec.end(),[](float in){ std::cout <<"val is "<<in<<std::endl;}); 

    return EXIT_SUCCESS; 
}

：

val is 2 
val is 2 
val is 2 
val is 2 
------------------------ 
val is 4 
val is 4 
val is 4 
val is 4 
------------------------ 
val is 2 
val is 4 
val is 4 
val is 4

をデバイスからホストへのコピーは思わない理由失敗する？しかし、Nvvpは完全な細かいクロノグラムを示しています。

ちなみに7.5パッケージのNVCC/cuda/thrust、titanXカードのgcc（GCC）4.8.5を使用しています。

ご協力いただきありがとうございます。

出典

2016-04-12 Tobbey

アクセス可能なプラットフォームではこれを再現できません。デバイスからホストコピーの後にベクターを印刷する前に、同期呼び出しを追加するとどうなりますか？ – talonmies

私はgtx680（計算機能3.0）でエラーを再現することもできます。実際に、cudaDeviceSynchronizeを追加することで、意図したとおりにコードが実行されます。私はthrust :: copyには同期動作があると信じていましたが、実際には推論の同期/非同期動作に関する情報はありません。http://thrust.github.io/doc/group__copying.html#ga24ccfaaa706a9163ec5117758fdb71b9 – Tobbey

これが本当のバグだった、とdeveloppersはすでにそれを知っていたスラスト、githubのリポジトリから推力の最新1.8.3バージョンを使用してhttps://github.com/thrust/thrust/issues/775

を見るには、私のために問題を解決しました。

出典

2016-04-12 19:24:12 Tobbey

推論を使用して奇妙な実験:: pinned_allocatorのuda

答えて

関連する問題