simple-cluster-常见的聚类python_simplecluster-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_44425179/article/details/130337033

https://zhuanlan.zhihu.com/p/127013012

BIRCH

DBSCAN

K-均值

Mini-Batch K-均值

Mean Shift

OPTICS

光谱聚类

高斯混合

聚合聚类

数据集

make _ classification ()函数创建一个测试二分类数据集。数据集将有1000个示例，每个类有两个输入要素（两个特征x1,x2）和一个群集（标签,y)。

# 综合分类数据集
from numpy import where
from sklearn.datasets import make_classification
from matplotlib import pyplot
# 定义数据集
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, 
                           n_redundant=0, n_clusters_per_class=1, random_state=4)
# 为每个类的样本创建散点图
for class_value in range(2):
# 获取此类的示例的行索引
    row_ix = where(y == class_value)
# 创建这些样本的散布
    pyplot.scatter(X[row_ix, 0], X[row_ix, 1])  #什么意思？
# 绘制散点图
pyplot.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dla2wnJZ-1682302936584)(output_3_0.png)]

一：birch聚类

# birch聚类
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import Birch
from matplotlib import pyplot
# 定义数据集
X, _ = make_classification(n_samples=1000, n_features=2, n_informative=2,
                           n_redundant=0, n_clusters_per_class=1, random_state=4)
# 定义模型
model = Birch(threshold=0.05, n_clusters=2) ###调整这个threshold=0.05，控制聚类精度
# 适配模型
model.fit(X)
# 为每个示例分配一个集群
yhat = model.predict(X)
# 检索唯一群集
clusters = unique(yhat)
# 为每个群集的样本创建散点图
for cluster in clusters:
# 获取此群集的示例的行索引
    row_ix = where(yhat == cluster)
# 创建这些样本的散布
    pyplot.scatter(X[row_ix, 0], X[row_ix, 1])
# 绘制散点图
pyplot.show()

在这里插入图片描述

二：DBSCAN 聚类

DBSCAN 聚类（其中 DBSCAN 是基于密度的空间聚类的噪声应用程序）涉及在域中寻找高密度区域，并将其周围的特征空间区域扩展为群集。

from sklearn.cluster import DBSCAN
# 定义模型
# 定义数据集
X, _ = make_classification(n_samples=1000, n_features=2, n_informative=2,
                           n_redundant=0, n_clusters_per_class=1, random_state=4)
model = DBSCAN(eps=0.30, min_samples=20)  # eps领域内包含15个邻居，则将其设置维核心点
# 模型拟合与聚类预测
yhat = model.fit_predict