softmaxによるアクション選択？

は、私はこれは聞いてかなり愚かな質問かもしれませんが、知っているどのようなボルツマン分布を使用し、ソフトマックス行動選択を、実装しようとした瞬間に地獄..softmaxによるアクション選択？

I。私はについて少しわからないよ何

Formula

は、あなたが特定のアクションを使用したい場合はどのように知らない方法ですか？私はこの機能によって私に確率を与えてくれるのですが、それを使って実行したいアクションを選択するにはどうすればいいですか？

いくつかの機械学習アプリケーションのため

出典

2016-05-23 Vato

あなたは、ソフトマックス関数によって与えられ、各アクションに割り当てられた確率の分布に基づいて、ランダムアクションの選択肢を生成するに取り掛かる方法を求めています？ –

この式を使用する方法がわかりません。確率が最も高いアクションを使用するのか、これでどうやって行くのでしょうか？ – Vato

最高の重みを持つアクションを選択することは純粋に "欲張りな"選択ポリシーに対応しますが、softmaxの前に最大の重みを持つアクションも最大になるため、softmaxのアクティブ化はまったく使用する必要はありませんソフトマックス確率。 Softmaxはその入力を1に合算する確率セットにマップし、その温度パラメータは純粋に貪欲な選択ポリシーと、すべてのアクションが等しく起こる選択ポリシーとの間の補間を指定する。この後、私は確率分布を使ってランダムな選択を期待します。 –

、（ニューラルネットワークからのような）の生出力のセットが、確率のセットにマッピングする必要がある点があり、reenforcement学習で1

に合計に正規化利用可能なアクションの重みのセットは、関連する確率のセットにマッピングされる必要があり、それは次に取られる次のアクションをランダムに選択するために使用される。

Softmax関数は、出力ウェイトを対応する確率のセットにマッピングするためによく使用されます。「温度」パラメータは、選択政策を調整し、純粋な搾取（最高の重み付けされた行動が常に選択される「欲張り」政策）と純粋な探索（各行動が選択される確率が等しい）との間を補間する。

これはSoftmax機能を使用する簡単な例です。各「アクション」は、このコードで渡されたオブジェクトvector<double>内の1つのインデックス付きエントリに対応します。ここで

#include <iostream> 
#include <iomanip> 
#include <vector> 
#include <random> 
#include <cmath> 


using std::vector; 

// The temperature parameter here might be 1/temperature seen elsewhere. 
// Here, lower temperatures move the highest-weighted output 
// toward a probability of 1.0. 
// And higer temperatures tend to even out all the probabilities, 
// toward 1/<entry count>. 
// temperature's range is between 0 and +Infinity (excluding these 
// two extremes). 
vector<double> Softmax(const vector<double>& weights, double temperature) { 
    vector<double> probs; 
    double sum = 0; 
    for(auto weight : weights) { 
     double pr = std::exp(weight/temperature); 
     sum += pr; 
     probs.push_back(pr); 
    } 
    for(auto& pr : probs) { 
     pr /= sum; 
    } 
    return probs; 
} 

// Rng class encapsulates random number generation 
// of double values uniformly distributed between 0 and 1, 
// in case you need to replace std's <random> with something else. 
struct Rng { 
    std::mt19937 engine; 
    std::uniform_real_distribution<double> distribution; 
    Rng() : distribution(0,1) { 
     std::random_device rd; 
     engine.seed(rd()); 
    } 
    double operator()() { 
     return distribution(engine); 
    } 
}; 

// Selects one index out of a vector of probabilities, "probs" 
// The sum of all elements in "probs" must be 1. 
vector<double>::size_type StochasticSelection(const vector<double>& probs) { 

    // The unit interval is divided into sub-intervals, one for each 
    // entry in "probs". Each sub-interval's size is proportional 
    // to its corresponding probability. 

    // You can imagine a roulette wheel divided into differently-sized 
    // slots for each entry. An entry's slot size is proportional to 
    // its probability and all the entries' slots combine to fill 
    // the entire roulette wheel. 

    // The roulette "ball"'s final location on the wheel is determined 
    // by generating a (pseudo)random value between 0 and 1. 
    // Then a linear search finds the entry whose sub-interval contains 
    // this value. Finally, the selected entry's index is returned. 

    static Rng rng; 
    const double point = rng(); 
    double cur_cutoff = 0; 

    for(vector<double>::size_type i=0; i<probs.size()-1; ++i) { 
     cur_cutoff += probs[i]; 
     if(point < cur_cutoff) return i; 
    } 
    return probs.size()-1; 
} 

void DumpSelections(const vector<double>& probs, int sample_count) { 
    for(int i=0; i<sample_count; ++i) { 
     auto selection = StochasticSelection(probs); 
     std::cout << " " << selection; 
    } 
    std::cout << '\n'; 
} 

void DumpDist(const vector<double>& probs) { 
    auto flags = std::cout.flags(); 
    std::cout.precision(2); 
    for(vector<double>::size_type i=0; i<probs.size(); ++i) { 
     if(i) std::cout << " "; 
     std::cout << std::setw(2) << i << ':' << std::setw(8) << probs[i]; 
    } 
    std::cout.flags(flags); 
    std::cout << '\n'; 
} 

int main() { 
    vector<double> weights = {1.0, 2, 6, -2.5, 0}; 

    std::cout << "Original weights:\n"; 
    for(vector<double>::size_type i=0; i<weights.size(); ++i) { 
     std::cout << " " << i << ':' << weights[i]; 
    } 
    std::cout << "\n\nSoftmax mappings for different temperatures:\n"; 
    auto softmax_thalf = Softmax(weights, 0.5); 
    auto softmax_t1  = Softmax(weights, 1); 
    auto softmax_t2  = Softmax(weights, 2); 
    auto softmax_t10 = Softmax(weights, 10); 

    std::cout << "[Temp 1/2] "; 
    DumpDist(softmax_thalf); 
    std::cout << "[Temp 1] "; 
    DumpDist(softmax_t1); 
    std::cout << "[Temp 2] "; 
    DumpDist(softmax_t2); 
    std::cout << "[Temp 10] "; 
    DumpDist(softmax_t10); 

    std::cout << "\nSelections from softmax_t1:\n"; 
    DumpSelections(softmax_t1, 20); 
    std::cout << "\nSelections from softmax_t2:\n"; 
    DumpSelections(softmax_t2, 20); 
    std::cout << "\nSelections from softmax_t10:\n"; 
    DumpSelections(softmax_t10, 20); 
}

は、出力の例を示します。

Original weights: 
    0:1 1:2 2:6 3:-2.5 4:0 

Softmax mappings for different temperatures: 
[Temp 1/2] 0: 4.5e-05 1: 0.00034 2:  1 3: 4.1e-08 4: 6.1e-06 
[Temp 1] 0: 0.0066 1: 0.018 2: 0.97 3: 0.0002 4: 0.0024 
[Temp 2] 0: 0.064 1: 0.11 2: 0.78 3: 0.011 4: 0.039 
[Temp 10] 0: 0.19 1: 0.21 2: 0.31 3: 0.13 4: 0.17 

Selections from softmax_t1: 
2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 

Selections from softmax_t2: 
2 2 2 2 2 2 1 2 2 1 2 2 2 1 2 2 2 2 2 1 

Selections from softmax_t10: 
0 0 4 1 2 2 2 0 0 1 3 4 2 2 4 3 2 1 0 1

出典

2016-05-24 00:50:10

softmaxによるアクション選択？

答えて

関連する問題