一括関数でのMPIデッドロック

私はMPIライブラリを使ってC++でプログラムを書いています。デッドロックが発生するのは1つのノードだけです！私は集合的な操作を送信または受信するのではなく、2つの集合関数（MPI_AllreduceおよびMPI_Bcast）のみを使用しています。ノードが他のノードから何かを送信したり受信したりするのを待っている場合、このデッドロックの原因を実際には理解していません。一括関数でのMPIデッドロック

void ParaStochSimulator::first_reacsimulator() { 
    SimulateSingleRun(); 
} 

double ParaStochSimulator::deterMinTau() { 
    //calcualte minimum tau for this process 
    l_nLocalMinTau = calc_tau(); //min tau for each node 
    MPI_Allreduce(&l_nLocalMinTau, &l_nGlobalMinTau, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD);  
    //min tau for all nodes 
    //check if I have the min value 
    if (l_nLocalMinTau <= l_nGlobalMinTau && m_nCurrentTime < m_nOutputEndPoint) { 
     FireTransition(m_nMinTransPos); 
     CalculateAllHazardValues(); 
    } 
    return l_nGlobalMinTau; 
} 

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
      //std::cout << "size of mani place :" << l_nMinplacesPos.size() << std::endl; 
     } 
    } 
    MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    PostProcessRun(); 
}

出典

2017-05-05 Ramy Al-Anwar

あなたの「マスター」のプロセスがA MPI_Bcastを実行していると、他のすべてはまだ続いMPI_Allreduceを実行し、その後、deterMinTauに入る、あなたのループを実行しています。

マスタノードがすべてのノードがBrodcastを実行するのを待機しており、他のすべてのノードがマスタノードがReduceを実行するのを待機しているため、これはデッドロックです。私はあなたが探しているものと信じて

は次のとおりです。

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     //All the nodes reduce tau at the same time 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      //Removed bordcast for master here 
     } 
     //All the nodes broadcast at every loop iteration 
     MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    } 
    PostProcessRun(); 
}

出典

2017-05-05 14:56:12 Tezirg

ご協力いただきありがとうございますが、残念ながら、私はマスターを形成放送を削除し、デッドロックがまだあります-_- –

一括関数でのMPIデッドロック

答えて

関連する問題