OpenMPからTBBへの変換

OpenMPコードをTBBに変換するのにはいくつかの困難があります。誰か助けてくれますか？OpenMPからTBBへの変換

私は結果が

# pragma omp parallel \ 
shared (b, count, count_max, g, r, x_max, x_min, y_max, y_min) \ 
private (i, j, k, x, x1, x2, y, y1, y2) 
{ 
# pragma omp for 

for (i = 0; i < m; i++) 
{ 
for (j = 0; j < n; j++) 
{ 
//cout << omp_get_thread_num() << " thread\n"; 
    x = ((double) ( j - 1) * x_max 
     + (double) (m - j ) * x_min) 
    /(double) (m  - 1); 

    y = ((double) ( i - 1) * y_max 
     + (double) (n - i ) * y_min) 
    /(double) (n  - 1); 

    count[i][j] = 0; 

    x1 = x; 
    y1 = y; 

    for (k = 1; k <= count_max; k++) 
    { 
    x2 = x1 * x1 - y1 * y1 + x; 
    y2 = 2 * x1 * y1 + y; 

    if (x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2) 
    { 
     count[i][j] = k; 
     break; 
    } 
    x1 = x2; 
    y1 = y2; 
    } 

    if ((count[i][j] % 2) == 1) 
    { 
    r[i][j] = 255; 
    g[i][j] = 255; 
    b[i][j] = 255; 
    } 
    else 
    { 
    c = (int) (255.0 * sqrt (sqrt (sqrt ( 
     ((double) (count[i][j])/(double) (count_max)))))); 
    r[i][j] = 3 * c/5; 
    g[i][j] = 3 * c/5; 
    b[i][j] = c; 
    } 
} 
} 
}

そしてTBBバージョンかなり良いですOpenMPの、中に次のコードを持っているのOpenMP

その後、10倍遅くTBBのためのコードがあるさ：

tbb::parallel_for (int(0), m, [&](int i) 
{ 
for (j = 0; j < n; j++) 
{ 
    x = ((double) ( j - 1) * x_max 
     + (double) (m - j ) * x_min) 
    /(double) (m  - 1); 

    y = ((double) ( i - 1) * y_max 
     + (double) (n - i ) * y_min) 
    /(double) (n  - 1); 

    count[i][j] = 0; 

    x1 = x; 
    y1 = y; 

    for (k = 1; k <= count_max; k++) 
    { 
    x2 = x1 * x1 - y1 * y1 + x; 
    y2 = 2 * x1 * y1 + y; 

    if (x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2) 
    { 
     count[i][j] = k; 
     break; 
    } 
    x1 = x2; 
    y1 = y2; 
    } 

    if ((count[i][j] % 2) == 1) 
    { 
    r[i][j] = 255; 
    g[i][j] = 255; 
    b[i][j] = 255; 
    } 
    else 
    { 
    c = (int) (255.0 * sqrt (sqrt (sqrt ( 
     ((double) (count[i][j])/(double) (count_max)))))); 
    r[i][j] = 3 * c/5; 
    g[i][j] = 3 * c/5; 
    b[i][j] = c; 
    } 
} 
});

出典

2016-12-23 Gabriel Stanica

TBBのデフォルトのパーティは、巨大なオーバーヘッドを生じる可能性がスレッドごとに1回のループ反復のレベルまで再帰的作業細分化を行う 'auto_partitioner'、です。多くのコンパイラを使った 'for'ワークシェアリングコンストラクトのデフォルトのスケジューリングは' static'なので、 'parallel_for'アルゴリズムを' static_partitioner'のシングルトンインスタンスに与えて、OpenMPと同じTBBで作業するようにしてください。私はTBB :: parallel_forは –

（TBB ::は、（0、M、0、N） \tをblocked_range2d [b]のための（TBB :: blocked_range2d S、static_partitioner（））{ \t （int j = s.cols（）; begin（）; j

とparallel_forのを変更した –

OpenMPバージョンのコードのprivate (i, j, k, x, x1, x2, y, y1, y2)に注意してください。この変数リストは、並列ループ本体内のプライベート/ローカル変数を指定します。しかし、TBB版のコードでは、これらの変数の多くはラムダによって参照（[&]）として取り込まれ、コードは正しくありません。それは競合があり、私の考えでは、これらの変数に複数のスレッド（キャッシュの一貫性のオーバーヘッドとループインデックスの混乱）からアクセスすることによって速度が低下することが考えられます。したがって、コードを修正したい場合は、これらの変数をローカルにします。

tbb::parallel_for (int(0), m, [&](int i) 
{ 
double x, y, x1, x2, y1, y2; // !!!! 
int j, k;     // !!!! 
for (j = 0; j < n; j++) 
{ 
    x = ((double) ( j - 1) * x_max 
     + (double) (m - j ) * x_min) 
    /(double) (m  - 1); 
...

出典

2016-12-23 15:49:24 Alex

答えて

関連する問題