聚类

最新推荐文章于 2022-05-12 06:00:00 发布

原创

最新推荐文章于 2022-05-12 06:00:00 发布 · 564 阅读

2 ·

CC 4.0 BY-SA版权

本文探讨了聚类方法，包括K-means聚类在iris数据集上的应用，展示了如何移除Species属性并进行聚类，以及比较聚类结果与原始类标号的一致性。还提到了k-medioids聚类在处理离群点时的优势，并介绍了经典的PAM算法。最后，文章介绍了层次聚类的概念。

1.K-means聚类

将iris数据集上演示K-means聚类的过程，首先要从iris数据集中移除Species属性，然后再对数据集iris2调用函数，并将聚类结果储存在变量kmeans.result中。

> attach(iris)
> iris2 <- iris
> iris2$Species <- NULL

> (kmeans.result <- kmeans(iris, 3))
K-means clustering with 3 clusters of sizes 50, 38, 62

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.006000    3.428000     1.462000    0.246000
2     6.850000    3.073684     5.742105    2.071053
3     5.901613    2.748387     4.393548    1.433871

Clustering vector:
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [41] 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3
 [81] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3
[121] 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2 2 3

Within cluster sum of squares by cluster:
[1] 15.15100 23.87947 39.82097
 (between_SS / total_SS =  88.4 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"