mean 平均值,k 聚成k类
算法目的:将数据分成k类
1.首先,在所有数据中随机选取中的k组数据作为k个中心点
2.分别计算每组数据和k个数据的误差,距离k个中心点哪个近,就标记为哪类
3.计算完每组数据后,重新计算中心点,计算方法是算出每组的平均值作为该组新的中心点
4.重复2~3,直至中心点不变
main.mat
close all;
for i=1:10
% clear workspace
clear;clc;
% set algorithm parameters
TOL = 0.0004;%学习率
ITER = 30;%迭代次数
kappa = 15;%分成15类
% load data 可以自己载入 .mat文件
data=load('Yale.mat');
X =data.X;
% run k-Means on random data
tic;
[C, I, iter] = myKmeans(X, kappa, ITER, TOL);
toc
% show number of iteration taken by k-means
disp(['k-means instance took ' int2str(iter) ' iterations to complete']);
end
% pause and close all windows
pause;
close all;
myKmeans.m
function [C, I, iter] = myKmeans(X, K, maxIter, TOL)
% 计算数据的行数和列数
[vectors_num, dim] = size(X);
%R为打乱的数据行数,选取前k个作为中心点
R = randperm(vectors_num);
% construct indicator matrix (each entry corresponds to the cluster
% of each point in X)
I = zeros(vectors_num, 1);
% 中心矩阵,K个中心所以是K行,dim列,先初始化为0
C = zeros(K, dim);
% 为中心矩阵赋值,选取R前K行
for k=1:K
C(k,:) = X(R(k),:);
end
% iteration count
iter = 0;
while 1
% find closest point
%一行一行算
for n=1:vectors_num
% 给minldx初始赋值
minIdx = 1;
%先取最小距离为该点到第一个中心点的距离
minVal = norm(X(n,:) - C(minIdx,:), 1);
for j=1:K
%计算该点到每一个中心点的距离
dist = norm(C(j,:) - X(n,:), 1);
if dist < minVal
%如果找到最小点,则将minldx记为该点
minIdx = j;
minVal = dist;
end
end
% 找到最近中心点,做标记
I(n) = minIdx;
end
% 重新计算k个中心点
for k=1:K
C(k, :) = sum(X(find(I == k), :));
C(k, :) = C(k, :) / length(find(I == k));
end
% compute RSS error
RSS_error = 0;
for idx=1:vectors_num
RSS_error = RSS_error + norm(X(idx, :) - C(I(idx),:), 2);
end
RSS_error = RSS_error / vectors_num;
% increment iteration
iter = iter + 1;
% check stopping criteria
if 1/RSS_error < TOL
break;
end
if iter > maxIter
iter = iter - 1;
break;
end
end
disp(['k-means took ' int2str(iter) ' steps to converge']);