python -DBSAN-笔记（及相关问题）11-02-2017

这篇笔记主要介绍了Python中DBSCAN聚类算法的应用，包括理解矩阵相关知识，学习评估指标如Homogeneity、Completeness、V-measure等，以及如何进行数据预处理。通过示例展示了数据转换前后对比，解释了np.zeros_like()函数，计算标签种类数的方法，并探讨了颜色映射在可视化中的应用，用于区分不同类别和离群点。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 查询：matrix 相关知识
例子：

print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f"
      % metrics.adjusted_rand_score(labels_true, labels))
print("Adjusted Mutual Information: %0.3f"
      % metrics.adjusted_mutual_info_score(labels_true, labels))
print("Silhouette Coefficient: %0.3f"
      % metrics.silhouette_score(X, labels))

out：
Homogeneity: 0.975
Completeness: 0.935
V-measure: 0.955
Adjusted Rand Index: 0.976
Adjusted Mutual Information: 0.935
Silhouette Coefficient: 0.661

2 查询相关文档学会cluster metrics 看懂dataset中的东西 preprocessing

from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler

产生一些随机样本点中心是centers 750个

centers =[[0,0],[1,2],[3,-1]]
X,labels_true = make_blobs(n_samples =750 , centers =centers,cluster_std=0.4,random_state=0)

转换X

X = StandardScaler().fit_transform(X)

转换之前
[[-0.15977961 0.14802236]
[ 0.84525166 1.7958829 ]
[-0.32136387 -0.27581991]
…,
[ 2.26798858 -1.27833405]
[ 1.11371187 2.69706751]
[ 2.60046048 -1.29605472]]
转换之后
[[-1.11638887 -0.13446227]
[-0.361879 1.13162887]
[-1.23769546 -0.46011054]
…,
[ 0.70621616 -1.23036641]
[-0.16033714 1.82403081]
[ 0.9558137 -1.24398163]]

np.zeros_like()
返回一个形状相同的全0array

n_clusters = len(set(labels))-(1 if -1 in labels else 0)

set(labels={-1,0,1,2}) 求出labels 种类数
(1 if -1 in labels else 0) 因为有-1 所以不等于0，等于1

colors = plt.cm.Spectral(np.linspace(0,3,len(unique_labels)))

np.linspace(0,1,len(unique_labels))
输出array([ 0. , 0.33333333, 0.66666667, 1. ])
np.linspace(0,3,len(unique_labels))
输出 array([ 0., 1., 2., 3.])
colors 都是输出：
array([[ 0.61960784, 0.00392157, 0.25882353, 1. ],
[ 0.99346405, 0.74771242, 0.43529412, 1. ],
[ 0.74771242, 0.89803922, 0.62745098, 1. ],
[ 0.36862745, 0.30980392, 0.63529412, 1. ]])

plt.cm.Spectral将默认的颜色映射设置为光谱，并应用于当前图像（如果有的话）。
看帮助（色彩映射表）的详细信息

画出簇内点和离群点

for k,col in zip(unique_labels,colors):
    if k == -1:
        col = 'k'
    class_member_mask = (labels== k)
    xy = X[ class_member_mask & core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,markeredgecolor='k', markersize=12)

    xy = X[ class_member_mask & ~core_samples_mask]
    plt.plot(xy[:, 0],xy[:, 1], 'o', markerfacecolor=col,markeredgecolor='k', markersize=6)