Mahout: An overview of clustering techniques

本文探讨了不同类型的聚类问题,包括独占聚类、重叠聚类、层次聚类及概率聚类,并介绍了几种常见的聚类方法,如固定数量中心的聚类方法、自底向上及自顶向下的聚类策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Different kinds of clustering problems

  • EXCLUSIVE CLUSTERING In exclusive clustering, an item belongs exclusively to one cluster, not several.
  • OVERLAPPING CLUSTERING What if we wanted to do non-exclusive clustering; that is, put Harry Potter not only in fiction but also in a young adult cluster as well as under fantasy. An overlapping clustering algorithm like fuzzy k-means achieves this easily. Moreover, fuzzy k-means also indicates the degree with which an object is associated with a cluster.



 

  • HIERARCHICAL CLUSTERING  Now, assume a situation where we have two clusters of books, one for fantasy and the other for space travel. Harry Potter is in the cluster of fantasy books, but these two clusters, space travel and fantasy, could be visualized as subclusters of fiction. Hence, we can construct a fiction cluster by merging these and other similar clusters.



 

  • PROBABILISTIC CLUSTERING A probabilistic model is usually a characteristic shape or a type of distribution of a set of points in an n-dimensional plane.


 

Different clustering approaches

  • FIXED NUMBER OF CENTERS These clustering methods fix the number of clusters ahead of time.
  • BOTTOM-UP APPROACH: FROM POINTS TO CLUSTERS VIA GROUPING



 

  • TOP-DOWN APPROACH: SPLITTING THE GIANT CLUSTER




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值