OpenGL Compute Shader - 正しいメモリバリアの使用

計算シェーダを使用して流体シミュレーションの一部としてSSBOの同じ要素を読み書きできることを期待していますが、同期に問題があります。私は16回実行されるテストシェイダーを持っています。下の3つのオプションは、私がやろうとしていることをうまく示しています。OpenGL Compute Shader - 正しいメモリバリアの使用

layout (std430, binding=8) coherent buffer Debug 
{ 
    int debug[ ]; 
}; 

shared int sharedInt; 

layout (local_size_x = 16, local_size_y = 1, local_size_z = 1) in; 

void main() 
{ 
    ///////  1.  /////// 
    sharedInt = debug[0]; 
    memoryBarrierShared(); 
    barrier(); 
    debug[0] = sharedInt[0] + 1; 
    memoryBarrierShared(); 
    barrier(); 

    // Print debug[0]: 1 


    ///////  2.  /////// 
    atomicAdd(debug[0], 1); 

    // Print debug[0]: 16 


    ///////  3.  /////// 
    sharedInt = debug[0]; 
    memoryBarrierShared(); 
    barrier(); 
    atomicExchange(debug[0], debug[0]+1); 
    memoryBarrierShared(); 
    barrier(); 

    // Print debug[0]: 1 
}

*私は一度に1つのオプションしか実行していません。

私が得ようとしているのは、デバッグ[0]が16に等しいということですが、読み書きする必要があるため、シミュレーションで第1または第3のオプションのようなものを使用する必要がありますSSBOは同じスレッド内にあります。

私は共有変数の役割を理解しているとは思えません。私はmemoryBarrierShared（）がsharedIntの読み書きをワークグループ内のすべてのスレッドに見せるようにするべきだと理解しています。それが同じ結果である派遣された唯一のワークグループです。

ありがとうございました。

出典

2016-08-12 Ewan

それは本当にはっきりしていませんあなたがやろうとしていること、なぜ共有変数があるのか、あるいはあなたの障壁が達成しようとしていること。あなたは 'debug [0]'を16にしたいと言っていますが、なぜ＃2がその問題に対する許容可能な解決策ではないのかは本当に不明です。 –

申し訳ありません。私はデバッグ[0]から読んでから後で1を追加しなければならないので、＃2を使用することはできません。シミュレーションでは、3Dグリッドを表すssboまたは現在どのパーティクルが各セルであるかを格納するセルがあります。各セルには、粒子の数、およびセル内の粒子のインデックスが格納されています。グリッドを塗りつぶすと、各パーティクルごとに計算シェーダが実行され、その中のセルが見つかると、セル内のパーティクルの数に応じて適切なメモリスロットに追加されるため、パーティクルの数を読み取る必要がありますそれを増分する。 – Ewan

変更1と3の追加はアトミック操作の一部ではないという主旨です。最初に共有変数/ ssboから読み込んだ後、追加を実行してから書き込みを行います。すべての呼び出しが同じ値を読み取る場合、それらはすべて加算の結果と同じ値を持ち、同じ値を書き込みます。

は、原子操作の追加部分を作るために、よく、あなたはあなたのコードがいくつか説明して注釈を付けていますバリアントここ2

に行ったよう、atomicAddを使用します。

///////  1.  /////// 

// all invocations read debug[0] into the shared variable (presumably 0) 
sharedInt = debug[0]; 

// syncing. but since all wrote the same value into the SSBO, they all would read the same value from it, 
// since values written in one invocation are always visible in the same invocation. 
memoryBarrierShared(); 
barrier(); 

// all invocations do the addition and add 1 to that shared variable (but not write to the shared variable) 
// then they all write the result of the addition (1) to the SSBO 
debug[0] = sharedInt[0] + 1; 

// another syncing that does not help if the shader ends here. 
memoryBarrierShared(); 
barrier(); 

// since they all write 1, there is no other output possible than a 1 in the SSBO. 
// Print debug[0]: 1 


///////  2.  /////// 
// all invocations tell the "atomic memory unit" (whatever that is exactly) 
// to atomicly add 1 to the SSBO. 
// that unit will now, sixteen times, read the value that is in the SSBO, 
// add 1, and write it back. and because it is does so atomicly, 
// these additions "just work" and don't use old values or the like, 
// so you have a 16 in your SSBO. 
atomicAdd(debug[0], 1); 

// Print debug[0]: 16 


///////  3.  /////// 

// as above, but this has even less effect since you don't read from sharedInt :) 
sharedInt = debug[0]; 
memoryBarrierShared(); 
barrier(); 

// all invocations read from debug[0], reading 0. 
they all add 1 to the read value, so they now have 1 in their registers. 
// now they tell the "atomic memory unit" to exchange whatever there is in 
// debug[0] with a 1. so you write a 1 sixteen times into debug[0] and end up with a 1. 
atomicExchange(debug[0], debug[0]+1); 
memoryBarrierShared(); 
barrier(); 

// Print debug[0]: 1

出典

2016-08-13 06:46:17 karyon

OpenGL Compute Shader - 正しいメモリバリアの使用

答えて

関連する問題