机器学习笔记(Washington University)- Clustering Specialization-week six

本文介绍了层次聚类的基本概念,包括避免事先选择簇的数量、使用树状图可视化不同粒度的聚类结果等特性。文中详细解释了两种主要的层次聚类方法:分裂法(自上而下)和凝聚法(自下而上),并提供了单链接凝聚法的具体步骤。此外,还展示了如何通过树状图来展示聚类过程。

1. Hierarchical clustering

  • Avoid choosing number of clusters beforehand
  • Dendrograms help visualize different clustering granularities (no need to rerun algorithm)
  • Most algorithm allow user to choose any distance metric (k-means restricted us to euclidean distance)
  • Can often find more  complex shapes than k-means or gaussian mixture model

Divisive (top-down):

start with all data in a big cluster and recursively split(recursive k-means)

  • which algorithm to recurse
  • how many clusters per split
  • when to split vs stop, max cluster size or max cluster radius or specified number of clusters

 

Agglomerative (bottom-up):

start with each data point at its own cluster, merge cluster until all points are in one big cluster (single linkage)

single linkage

  • initialize each point to be its own cluster
  • define distance between clusters to bb the minimum distance of C1 in cluster one and C2 in clustrer two
  • merge the two closest cluster
  • repeat step 3 until all points are in one cluster

 

Dendrogram

x axis shows data points (carefully ordered).

y axis shows distance between pairs of clusters.

Path shows all cluser to which a point belongs and the order in which clusters merge.

 

转载于:https://www.cnblogs.com/climberclimb/p/6935542.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值