The tree of a hierarchical clustering can be produced either bottom-up,by starting with the individual objects and grouping the most similar ones, ortop-down, by starting with all the objects and dividing them into groups. Thebottom-up algorithm is also called agglomerative clustering, while thetop-down one is called divisive clustering. The ideas of the two arequite similar expect the former is to merge similar clusters while the latteris to split dissimilar clusters.
Agglomerative clustering is a greedy algorithm that starts with aseparate cluster for each object. In each step, the two most similar clustersare determined, and merged into a new cluster. The algorithm stops when acertain stop criterion is met. Usually, it stops when one large cluster containingall objects are formed. Figure 2 is the pseudo code of the agglomerativeclustering.
Figure 1: agglomerative hierarchicalclustering
Besides the algorithm itself, there is a trick on how to compute groupsimilarity based on individual object similarity. There are three similarityfunctions commonly used as shown on table 1
Function |
Definition |
single link |
similarity of the closest pair |
compete link |
similarity of the farthest pair |
average link |
average similarity between all pairs |
Table 1: similarity functionsused in clustering
In single link clustering, the similarity between two clusters is thesimilarity of the two closest objects in the clusters. Clusters based on thisfunction have good local coherence since the similarity function is locallydefined. However, it can result in bad global quality, since it has no way totake into account the global context. As opposed to locally coherent clustersas in the case of single-link clustering, complete-link clustering has asimilarity function that focuses on global cluster quality. Also the result canbe looked as “tight” clusters since it comes from the similarity of two mostdissimilar members. Both of the two functions are based on individual objects,and thus are sensitive to outliers. However, the average link function isimmune to such sensitivity since it is based on group decision. As we see,these functions have their own advantages and are good for differentapplications. Nevertheless, in most patient clustering applications, the globalcoherence is preferable to local coherence. In terms of computationalcomplexity: single link clustering is O(n2), but complete link isO(n3). The average link similarity can be computed efficiently insome cases so that the complexity of the algorithm is only O(n2).Therefore, the average link function can be an efficient alternative tocomplete link function while avoiding the bad global coherence of single linkfunction.