このコードは、作り付けのMATLAB関数「K-手段であります' k-meansのために独自のアルゴリズムで修正する必要があります。これは、クラスタのcentoirdsの計算と二乗誤差の和(派閥とも呼ばれます)を示しています。
clc; close all; clear all;
data = readtable('data.txt'); % Importing the data-set
d1 = table2array(data(:, 2)); % Data in first dimension
d2 = table2array(data(:, 3)); % Data in second dimension
d3 = table2array(data(:, 4)); % Data in third dimension
d4 = table2array(data(:, 5)); % Data in fourth dimension
X = [d1, d2, d3, d4]; % Combining the data into a matrix
k = 3; % Number of clusters
idx = kmeans(X, 3); % Alpplying the k-means using inbuilt funciton
%% Separating the data in different dimension
d1_1 = d1(idx == 1); % d1 for the data in cluster 1
d2_1 = d2(idx == 1); % d2 for the data in cluster 1
d3_1 = d3(idx == 1); % d3 for the data in cluster 1
d4_1 = d4(idx == 1); % d4 for the data in cluster 1
%==============================
d1_2 = d1(idx == 2); % d1 for the data in cluster 2
d2_2 = d2(idx == 2); % d2 for the data in cluster 2
d3_2 = d3(idx == 2); % d3 for the data in cluster 2
d4_2 = d4(idx == 2); % d4 for the data in cluster 2
%==============================
d1_3 = d1(idx == 3); % d1 for the data in cluster 3
d2_3 = d2(idx == 3); % d2 for the data in cluster 3
d3_3 = d3(idx == 3); % d3 for the data in cluster 3
d4_3 = d4(idx == 3); % d4 for the data in cluster 3
%% Finding the co-ordinates of the cluster centroids
c1_d1 = mean(d1_1); % d1 value of the centroid for cluster 1
c1_d2 = mean(d2_1); % d2 value of the centroid for cluster 1
c1_d3 = mean(d3_1); % d2 value of the centroid for cluster 1
c1_d4 = mean(d4_1); % d2 value of the centroid for cluster 1
%====================================
c2_d1 = mean(d1_2); % d1 value of the centroid for cluster 2
c2_d2 = mean(d2_2); % d2 value of the centroid for cluster 2
c2_d3 = mean(d3_2); % d2 value of the centroid for cluster 2
c2_d4 = mean(d4_2); % d2 value of the centroid for cluster 2
%====================================
c3_d1 = mean(d1_3); % d1 value of the centroid for cluster 3
c3_d2 = mean(d2_3); % d2 value of the centroid for cluster 3
c3_d3 = mean(d3_3); % d2 value of the centroid for cluster 3
c3_d4 = mean(d4_3); % d2 value of the centroid for cluster 3
%% Calculating the distortion
distortion = 0; % Initialization
for n1 = 1 : length(d1_1)
distortion = distortion + (((c1_d1 - d1_1(n1)).^2) + ((c1_d2 - d2_1(n1)).^2) + ...
((c1_d3 - d3_1(n1)).^2) + ((c1_d4 - d4_1(n1)).^2));
end
for n2 = 1 : length(d1_2)
distortion = distortion + (((c2_d1 - d1_2(n2)).^2) + ((c2_d2 - d2_2(n2)).^2) + ...
((c2_d3 - d3_2(n2)).^2) + ((c2_d4 - d4_2(n2)).^2));
end
for n3 = 1 : length(d1_3)
distortion = distortion + (((c3_d1 - d1_3(n3)).^2) + ((c3_d2 - d2_3(n3)).^2) + ...
((c3_d3 - d3_3(n3)).^2) + ((c3_d4 - d4_3(n3)).^2));
end
fprintf('The unnormalized sum of square error is %f\n', distortion);
fprintf('The co-ordinate of the cluster 1 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c1_d1, c1_d2, c1_d3, c1_d4);
fprintf('The co-ordinate of the cluster 2 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c2_d1, c2_d2, c2_d3, c2_d4);
fprintf('The co-ordinate of the cluster 3 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c3_d1, c3_d2, c3_d3, c3_d4);
組み込みの 'kmeans'関数の使用に問題がありますか、ゼロからビルドしていますか? –
@LeanderMoesingerコメントありがとうございます。実際に私は関数で構築されたkmeansを使用することができますが、matlabのヘルプの例では、クラスタの平均、中心、サイズ、および各クラスタに割り当てられたデータのリストをどのように計算するべきか理解できませんでした。 – Bilgin