Matlab：時系列データを2値化するためにSOMとkmeansを適用できますか？

同様の質問がありました。Determining cluster membership in SOM (Self Organizing Map) for time series data Matlab：時系列データを2値化するためにSOMとkmeansを適用できますか？

2値化に自己組織化マップを適用したり、2種類以上のシンボルをデータに割り当てる方法を学びたいと思います。

たとえば、data = rand(100,1)一般に、しきい値0.5を仮定して固定する2進数の変換された系列を得るには、data_quantized = 2*(data>=0.5)-1を実行します。より多くの2つの記号を使用してデータを量子化することが可能であったかもしれない。このタスクを実行するためにkmeansまたはSOMを適用できますか？データを量子化する際にSOMを使用する場合は、どのような入出力が必要ですか？

X = {x_i(t)} i = 1の場合：Nおよびt = 1：時系列Tの場合、Nはコンポーネント/変数の数を表します。任意のベクトルx_iの量子化値を得るには、最も近いBMUの値を使用する。量子化誤差は、入力ベクトルと最良適合モデルとの差のユークリッド標準である。次いで、新しい時系列が、時系列の記号表現を使用して比較/マッチングされる。 WOuld BMUはスカラー値または浮動小数点数のベクトルですか？ SOMが何をしているのかを理解することは非常に難しいです。

Matlabの実装https://www.mathworks.com/matlabcentral/fileexchange/39930-self-organizing-map-simple-demonstration

私は量子化における時系列のために働く方法を理解することはできません。ホワイトノイズ処理から得られた要素の1次元配列/ベクトルであるN = 1を仮定すると、自己組織化マップを使用してこのデータをどのように量子化/分割することができますか？

http://www.mathworks.com/help/nnet/ug/cluster-with-self-organizing-map-neural-network.html

MATLABが提供されるが、それは、N次元データのために働くが、私は1000個のデータポイント（T = 1、...、1000）を含有する1次元データを有しています。

時系列をどのように複数のレベルに量子化することができるかを説明するおもちゃの例が提供されている場合、非常に役立つはずです。 Let、trainingData = x_i;

T = 1000; 
N = 1; 
x_i = rand(T,N) ;

数値値データは、すなわち1,2,3などの記号で表すことができるように、SOMの以下のコードは、3つのシンボルを使用してクラスタ化どのように適用することができますか？データポイントは、（スカラーが大切）私はあなたの質問を誤解されるかもしれないことを知りませんが、私が理解から、それは実際にはかなり単純ですシンボル1または2または3

function som = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, etadecay, sgm0, sgmdecay, showMode) 
%SOMSimple Simple demonstration of a Self-Organizing Map that was proposed by Kohonen. 
% sommap = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, neta, sgm0, nsgm, showMode) 
% trains a self-organizing map with the following parameters 
%  nfeatures  - dimension size of the training feature vectors 
%  ndim    - width of a square SOM map 
%  nepochs   - number of epochs used for training 
%  ntrainingvectors - number of training vectors that are randomly generated 
%  eta0    - initial learning rate 
%  etadecay   - exponential decay rate of the learning rate 
%  sgm0    - initial variance of a Gaussian function that 
%       is used to determine the neighbours of the best 
%       matching unit (BMU) 
%  sgmdecay   - exponential decay rate of the Gaussian variance 
%  showMode   - 0: do not show output, 
%       1: show the initially randomly generated SOM map 
%        and the trained SOM map, 
%       2: show the trained SOM map after each update 
% 
% For example: A demonstration of an SOM map that is trained by RGB values 
%   
%  som = SOMSimple(1,60,10,100,0.1,0.05,20,0.05,2); 
%  % It uses: 
%  % 1 : dimensions for training vectors 
%  % 60x60: neurons 
%  % 10 : epochs 
%  % 100 : training vectors 
%  % 0.1 : initial learning rate 
%  % 0.05 : exponential decay rate of the learning rate 
%  % 20 : initial Gaussian variance 
%  % 0.05 : exponential decay rate of the Gaussian variance 
%  % 2 : Display the som map after every update 

nrows = ndim; 
ncols = ndim; 
nfeatures = 1; 
som = rand(nrows,ncols,nfeatures); 


% Generate random training data 
    x_i = trainingData; 

% Generate coordinate system 
[x y] = meshgrid(1:ncols,1:nrows); 

for t = 1:nepochs  
    % Compute the learning rate for the current epoch 
    eta = eta0 * exp(-t*etadecay);   

    % Compute the variance of the Gaussian (Neighbourhood) function for the ucrrent epoch 
    sgm = sgm0 * exp(-t*sgmdecay); 

    % Consider the width of the Gaussian function as 3 sigma 
    width = ceil(sgm*3);   

    for ntraining = 1:ntrainingvectors 
     % Get current training vector 
     trainingVector = trainingData(ntraining,:); 

     % Compute the Euclidean distance between the training vector and 
     % each neuron in the SOM map 
     dist = getEuclideanDistance(trainingVector, som, nrows, ncols, nfeatures); 

     % Find the best matching unit (bmu) 
     [~, bmuindex] = min(dist); 

     % transform the bmu index into 2D 
     [bmurow bmucol] = ind2sub([nrows ncols],bmuindex);   

     % Generate a Gaussian function centered on the location of the bmu 
     g = exp(-(((x - bmucol).^2) + ((y - bmurow).^2))/(2*sgm*sgm)); 

     % Determine the boundary of the local neighbourhood 
     fromrow = max(1,bmurow - width); 
     torow = min(bmurow + width,nrows); 
     fromcol = max(1,bmucol - width); 
     tocol = min(bmucol + width,ncols); 

     % Get the neighbouring neurons and determine the size of the neighbourhood 
     neighbourNeurons = som(fromrow:torow,fromcol:tocol,:); 
     sz = size(neighbourNeurons); 

     % Transform the training vector and the Gaussian function into 
     % multi-dimensional to facilitate the computation of the neuron weights update 
     T = reshape(repmat(trainingVector,sz(1)*sz(2),1),sz(1),sz(2),nfeatures);     
     G = repmat(g(fromrow:torow,fromcol:tocol),[1 1 nfeatures]); 

     % Update the weights of the neurons that are in the neighbourhood of the bmu 
     neighbourNeurons = neighbourNeurons + eta .* G .* (T - neighbourNeurons); 

     % Put the new weights of the BMU neighbouring neurons back to the 
     % entire SOM map 
     som(fromrow:torow,fromcol:tocol,:) = neighbourNeurons; 


    end 
end 


function ed = getEuclideanDistance(trainingVector, sommap, nrows, ncols, nfeatures) 

% Transform the 3D representation of neurons into 2D 
neuronList = reshape(sommap,nrows*ncols,nfeatures);    

% Initialize Euclidean Distance 
ed = 0; 
for n = 1:size(neuronList,2) 
    ed = ed + (trainingVector(n)-neuronList(:,n)).^2; 
end 
ed = sqrt(ed);

出典

2016-12-04 SKM

このホワイトペーパー（PDF）が[自己組織化マップを用いて証券取引時系列からパターンディスカバリー]有望に見える（http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1 .1.124.3646＆rep = rep1＆type = pdf） – Oleg

リンクありがとうございますが、私の問題の定義は異なります。この論文では次元削減にSOMを使用していますが、SOMを適用して時系列を複数のシンボル/レベルに量子化します。例えば、私は時系列 'data = rand（100,1）'を持っています。一般に、私は 'data_quantized = 2 *（data> = 0.5）-1'を実行して、仮定され、固定される。 2つ以上のシンボルを使って 'data'を量子化することは可能かもしれません。これにはkmeansやSOMを適用できますか？ – SKM

あなたの質問を単純すぎるかもしれませんが、 'kmeans（data、k）'を試しましたか？ 'kはシンボルの数ですか？あなたがここで何をしようとしているのかが正確に聞こえるからです。 –

で表すことができるのいずれか両方ともkmeansで、Matlab自身のselforgmapである。あなたがSOMSimpleのために投稿した実装は、本当にコメントできません。

のは、あなたの最初の例で見てみましょう：

rng(1337); 
T = 1000; 
x_i = rand(1,T); %rowvector for convenience

あなたは3つのシンボルに量子化したいと仮定し、あなたの手動バージョンは次のようになります。Matlabの自身のselforgmapを使用して

nsyms = 3; 
symsthresh = [1:-1/nsyms:1/nsyms]; 
x_i_q = zeros(size(x_i)); 

for i=1:nsyms 
    x_i_q(x_i<=symsthresh(i)) = i; 
end

は、あなたが同様の結果を得ることができます。

net = selforgmap(nsyms); 
net.trainParam.showWindow = false; 
net = train(net,x_i); 
net(x_i); 
y = net(x_i); 
classes = vec2ind(y);

最後に、同じことができるrwardly kmeansと：

clusters = kmeans(x_i',nsyms)';

出典

2016-12-08 12:02:15

ありがとうございます。私は疑いがほとんどない、あなたは明確にしてもらえますか？（1） 'symsthresh = [1：-1/k：1/k];の文で' k = 3'ですか？（2）SOMを使用する場合、最初にSOMを訓練していますか？ Matlabの組み込み関数を使ってSOMが何をしているのかを理解することは非常に難しいです。なぜなら、近傍サイズが何であるか、ネオロンの数、ベストマッチングユニットなどがわからないからです。したがって、実装を理論に関連づけるのが難しくなります。データポイントとシンボルとの関連を示すSOMの出力を見る方法はありますか？ – SKM

（3）データ「x_i」の浮動小数点値を変数「x_i_q」、「classes」および「clusters」で示されるシンボルに、手動アプローチ、SOMおよびkmeansの3つのメソッドを使用して変換した結果すべて違う。どちらの方法が記号で正しい表現を与えるかをどのようにチェックするか？ – SKM

3つの方法は、結果が非常にわずかに異なります。違いを比較するには、クラスを同じラベルに転送し、クラスタリング/分類の違いを探します。私が答えられないあなたの判断によれば、「正しい」とは何か。あなたの最初の質問に関して：変数名を変更しました、申し訳ありません。私はそれを修正した。 –

Matlab：時系列データを2値化するためにSOMとkmeansを適用できますか？

答えて

関連する問題