关于K-means和K-medoids的描述,参见pluskid博客http://blog.pluskid.org/?tag=clustering或http://blog.youkuaiyun.com/abcjennifer/article/details/8197072
下面给出首先给出matlab关于K-means的matlab代码:
function [labels,Cnt] = kmeans(k,D,threshold=1e-10)
%KMEANS Summary of this function goes here
%Detailed explanation goes here
N=length(D);
R_I = randperm(N,k);
Cnt = D(R_I,:);
%k Random cluster centre;
labels=zeros(N,1);
while(true)
dist=zeros(k,1);
for l=1:N
for i=1:k
dist(i)=norm(D(l,:)-Cnt(i,:));
end
[~,t]=min(dist);
labels(l)=t;
end
sum=zeros(k,2);
cont=zeros(k,1);
for l=1:N
sum(labels(l),:)=sum(labels(l),:)+D(l,:);
cont(labels(l),:)=cont(labels(l),:)+1;
end
for i=1:k
sum(i,:)=sum(i,:)/cont(i,:);
end
%average, and obtain new centres;
if norm(Cnt-sum)<threshold
break;
else
Cnt=sum;
end
end
end实验的数据采用三个高斯分布生成% generate out Gaussian distribution samples;
mu=[0,-15];
sigma=[45 ,0;0,45];
r1=mvnrnd(mu,sigma,300);
mu=[5,15];
sigma=[15 ,0;0,15];
r2=mvnrnd(mu,sigma,300);
mu=[-5,7];
sigma=[15,0;0,15];
r3=mvnrnd(mu,sigma,300);
figure;
plot(r1(:,1),r1(:,2),'r*',r2(:,1),r2(:,2),'b*',r3(:,1),r3(:,2),'g*');
title('the generating data');
D=[r1;r2;r3]
medoids算法要求计算centres的值在已有的数据点中,这样提高了鲁棒性,因此需要计算每一个点在该类中的距离:
function [labels,Cnt] = kmedoids(k,D,threshold)
%KMEDOIDS Summary of this function goes here
% Detailed explanation goes here
N=length(D);
R_I = randperm(N,k);
Cnt = D(R_I,:);
%k Random cluster centre;
labels=zeros(N,1);
while(true)
dist=zeros(k,1);
for l=1:N
for i=1:k
dist(i)=norm(D(l,:)-Cnt(i,:));
end
[~,t]=min(dist);
labels(l)=t;
end
dist_mat=cell(k,1);
for s=1:k
dist_mat{s}=zeros(N,N);
end
for l=1:N
for p=l+1:N
if labels(l)~=labels(p)
continue;
else
dist_mat{labels(l)}(l,p)=norm(D(p,:)-D(l,:));
dist_mat{labels(l)}(p,l)=dist_mat{labels(l)}(l,p);
end
end
end
Cnt_=D(R_I,:);
for s=1:k
temp=sum(dist_mat{s},1,'double');
[~,t]=min(temp);
minimal=realmax;
for l=1:N
if (minimal > temp(l)) & (labels(l)==s)
minimal=temp(l);
Cnt_(s,:)=D(l,:);
end
end
end
%average, and obtain new centres;
if norm(Cnt-Cnt_)<threshold
break;
else
Cnt=Cnt_;
end
end
end
本文详细介绍了K-means和K-medoids两种聚类算法的原理、MATLAB实现代码,并通过生成三个高斯分布数据集进行实验验证了算法的有效性。K-medoids算法相较于K-means具有更高的鲁棒性,因为它使用实际数据点作为中心点。
1308

被折叠的 条评论
为什么被折叠?



