並列領域内でOpenMP forループを選択的に有効にする

テンプレートパラメータまたは実行時変数を使用してopenmpディレクティブを選択的に有効にすることはできますか？並列領域内でOpenMP forループを選択的に有効にする

this (all threads work on the same for loop). 
#pragma omp parallel 
{ 
    #pragma omp for 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
} 
versus this (each thread works on its own for loop) 
#pragma omp parallel 
{ 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
}

更新（テスト句場合）

TEST.CPP：（G ++ TEST.CPP -fopenmpでコンパイルG ++ 6）

#include <iostream> 
#include <omp.h> 

int main() { 
    bool var = true; 
    #pragma omp parallel 
    { 
     #pragma omp for if (var) 
     for (int i = 0; i < 4; ++i) { 
      std::cout << omp_get_thread_num() << "\n"; 
     } 
    } 
}

エラーメッセージ

test.cpp: In function ‘int main()’: 
test.cpp:8:25: error: ‘if’ is not valid for ‘#pragma omp for’ 
     #pragma omp for if (var) 
         ^~

出典

2017-02-15 hamster on wheels

'#pragma omp parallel if（variable）' –

どちらのバージョンも並列ですが、ほとんどの場合、 '#pragma omp for line'を選択的に有効にします。 if節がfor節で動作できるかどうか調べるつもりです。ありがとう。 –

です。うまくいけば、これはすべてのコンパイラに当てはまります。 –

一種の回避策。スレッドIDを取得するための条件を取り除くことが可能かどうかはわかりません。

#include <iostream> 
#include <omp.h> 
#include <sstream> 
#include <vector> 
int main() { 
    constexpr bool var = true; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel if (var) 
    { 

     const int thread_id0 = omp_get_thread_num(); 
     #pragma omp parallel 
     { 
      int thread_id1; 
      if (var) { 
       thread_id1 = thread_id0; 
      } else { 
       thread_id1 = omp_get_thread_num(); 
      } 

      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id1] << i << ", "; 
      } 
     } 
    } 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
}

出力（ときvar == true）：

n_threads: 8 
thread 0: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 1: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 2: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 3: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 5: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 6: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 7: 0, 1, 2, 3, 4, 5, 6, 7,

出力（var == false）：

n_threads: 8 
thread 0: 0, 
thread 1: 1, 
thread 2: 2, 
thread 3: 3, 
thread 4: 4, 
thread 5: 5, 
thread 6: 6, 
thread 7: 7,

出典

2017-02-15 17:14:50

これはclangとg ++の両方で動作します。インテルコンパイラについては不明です。 –

ネストされた並列処理が有効な場合、期待どおりに動作しません。 –

#include <omp.h> 
#include <sstream> 
#include <vector> 
#include <iostream> 
int main() { 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel 
    { 
     const int thread_id = omp_get_thread_num(); 
     if (var) { 
      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } 
     } else { 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } // code duplication 
     } 
    } 
    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
}

出典

2017-02-15 18:23:13

'else'ブロック内のコードが実際に入れ子にされた並列領域を作成することは、驚くべき結果につながるかもしれません。 OPが望むように動作するように思われる唯一の理由は、ネストされた並列処理がデフォルトでは無効になっており、領域が各スレッドで連続して実行されることです。 –

ありがとうございます。私はそれを 'else'ブロックで' #pragma omp parallel for'を取り除いて修正しました。 –

申し訳ありませんが、私はあなたがOPであることを認識していませんでした。あなたは本当に両方の答えを一つにまとめるべきです。 –

私は慣用的なC++ソリューションが背後にある別のOpenMPプラグマを非表示にすることだと思いますアルゴリズム的なオーバーロード。あなたには、いくつかの特定のタイプを使用する場合は

#include <iostream> 
#include <sstream> 
#include <vector> 
#include <omp.h> 

#include <type_traits> 
template <bool ALL_PARALLEL> 
struct impl; 

template<> 
struct impl<true> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel 
    { 
     for (ITER i = begin; i != end; ++i) { 
     func(i); 
     } 
    } 
    } 
}; 

template<> 
struct impl<false> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel for 
    for (ITER i = begin; i < end; ++i) { 
     func(i); 
    } 
    } 
}; 

// This is just so we don't have to write parallel_foreach()(...) 
template <bool ALL_PARALLEL, typename ITER, typename CALLABLE> 
void parallel_foreach(ITER begin, ITER end, const CALLABLE& func) 
{ 
    impl<ALL_PARALLEL>()(begin, end, func); 
} 

int main() 
{ 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    parallel_foreach<var>(0, 8, [&s](auto i) { 
     s[omp_get_thread_num()] << i << ", "; 
    }); 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
}

することは、あなたはタイプによってオーバーロードを行う代わりに、boolテンプレートパラメータを使用すると、要素ではなく、数値インデックス付きのループを繰り返し処理することができます。 OpenMPワーク共有ループでC++ランダムアクセスイテレータを使用できることに注意してください。あなたのタイプによっては、呼び出し側の内部データアクセスに関するすべてを隠すイテレータを実装することができます。

出典

2017-02-15 19:19:57 Zulan

私は、オーバーヘッドがイテレータにとってかなり大きかったと思った：http://stackoverflow.com/questions/2513988/iteration-through-std-containers-in-openmpそれが今でも本当であるかどうか確かめてください。それを読んだあと、openmp forループの場合は、クラスのイテレータを書くことは避けてください。 –

リンクされた回答を誤解しました。彼が与える例は、 'std :: set'のためのもので、ランダムアクセス反復子を持たないものです。したがって、彼はループワークシェアリングコンストラクト（ '#pragma omp（parallel）for'）を使用せず、手作業のループを使用します。ランダムアクセス・イテレータで通常の '#pragma omp for'を使用すると、固有のオーバーヘッドはありません。あなたの最適化の走行距離は異なる場合がありますので、測定して比較してください。ありがとう。 – Zulan

ありがとう。私は次のプロジェクトでランダムアクセスイテレータを追加すると思います... –

並列領域内でOpenMP forループを選択的に有効にする

答えて

関連する問題