Single-Linkage Clustering: The Algorithm

最新推荐文章于 2022-02-21 23:09:55 发布

转载最新推荐文章于 2022-02-21 23:09:55 发布 · 3.4k 阅读

文章标签：

#algorithm #matrix #pair #scheme #numbers #merge

学术研究专栏收录该内容

5 篇文章

订阅专栏

本文介绍了一种层次聚类算法的工作原理及步骤。该算法通过不断合并最相似的簇来构建层次化的聚类结构，并使用距离矩阵更新合并后的簇与其他簇的距离。适用于需要构建层级结构的数据集。

部署运行你感兴趣的模型镜像

The algorithm is an agglomerative scheme that erases rows and columns in the proximity matrix as old clusters are merged into new ones.

The N*N proximity matrix is D = [d(i,j)]. The clusterings are assigned sequence numbers 0,1,......, (n-1) and L(k) is the level of the kth clustering. A cluster with sequence number m is denoted (m) and the proximity between clusters (r) and (s) is denoted d [(r),(s)].

The algorithm is composed of the following steps:

Begin with the disjoint clustering having level L(0) = 0 and sequence number m = 0.
Find the least dissimilar pair of clusters in the current clustering, say pair (r), (s), according to

d[(r),(s)] = min d[(i),(j)]

where the minimum is over all pairs of clusters in the current clustering.
Increment the sequence number : m = m +1. Merge clusters (r) and (s) into a single cluster to form the next clustering m. Set the level of this clustering to

L(m) = d[(r),(s)]
Update the proximity matrix, D, by deleting the rows and columns corresponding to clusters (r) and (s) and adding a row and column corresponding to the newly formed cluster. The proximity between the new cluster, denoted (r,s) and old cluster (k) is defined in this way:

d[(k), (r,s)] = min d[(k),(r)], d[(k),(s)]
If all objects are in one cluster, stop. Else, go to step 2.