k-means是聚类中的一个十分经典的算法,具体的思想可以参考Andrew Ng的讲义《The k-means clustering algorithm》,这里不再赘述。
需要用到Matlab中的核心函数kmeans,具体用法可以参考Matlab命令: doc kmeans
IDX = kmeans(X,k)
[IDX,C] = kmeans(X,k)
[IDX,C,sumd] = kmeans(X,k)
[IDX,C,sumd,D] = kmeans(X,k)
partitions the points in the n-by-p data matrix X into k clusters. Rows of X correspond to points, columns correspond to variables.
n-by-1 vector IDX containing the cluster indices of each point.
k cluster centroid locations in the k-by-p matrix C.
within-cluster sums of point-to-centroid distances in the 1-by-k vector sumd.
distances from each point to every centroid in the n-by-k matrix D.
还有一些可选的参数,包括:'distance', 'options', 'replicates'等。
'distance'表示距离,缺省为欧几里得距离,还包括绝对距离,cos距离,海明距离等。
'options'可以通过sta