Kmeans

算法梗概

The k-means algorithm is one of the simplest yet most popular machine learning algorithms. It takes in the data points and the number of clusters (k) as input.

Next, it randomly plots k different points on the plane (called centroids). After the k centroids are randomly plotted, the following two steps are repeatedly performed until there is no further change in the set of k centroids:

  • Assignment of points to the centroids: Every data point is assigned to the centroid that is the closest to it. The collection of data points assigned to a particular centroid is called a cluster. Therefore, the assignment of points to k centroids results in the formation of k clusters.
  • Reassignment of centroids: In the next step, the centroid of every cluster is recomputed to be the center of the cluster (or the average of all the points in the cluster). All the data points are then reassigned to the new centroids:

Kmeans演示

Kmeans演示

代码

import numpy as np
import matplotlib.pyplot as plt

class Kmeans:
    """使用python和numpy实现Kmeans算法"""
    def __init__(self, k_):
        self.k = k_         # k是指定的簇的个数
        self.threhold = 1e-10
        self.last_k_cluster = None

    def fit(self, X):
        # 将X转变为ndarray结构
        X = np.array(X)
        # 设置随机种子
        np.random.seed(20)
        #  随机取k个向量作为初始簇中心
        self.k_cluster = X[np.random.randint(0, len(X), self.k)]
        # 初始化X的标签
        self.labels = np.zeros(len(X))
        times = 0
        plt.scatter(X[:,0], X[:, 1], c='black')
        plt.pause(1)
        while True:
            # 为X中的每个点分簇
            for index, point in enumerate(X):  # 对于X中的每一个向量point,计算point到每个簇中心的欧式距离的平方和
                distance = np.sum(np.power(point-self.k_cluster, 2), axis=1)    # 得益与numpy的广播特性,所以可以这么写
                self.labels[index] = distance.argmin()   # 将点point分为欧式距离的平方和最小的簇下标

            # 作图
            plt.scatter(X[:, 0], X[:, 1], c=self.labels, s=50)   # 将刚分好簇的各点填色展示出来
            plt.scatter(self.k_cluster[:,0], self.k_cluster[:,1], marker='X', c='black', s=100)
            plt.pause(0.5)

            # path = './Images/' + str(times) + '.jpg'
            # plt.savefig(path)
            # times += 1

            # 更新每个簇的中心点,更新办法为"the average of all the points in the cluster"
            self.last_k_cluster = self.k_cluster.copy() # 保存上一次所有的簇中心
            for i in range(self.k):
                self.k_cluster[i] = np.mean(X[self.labels == i], axis=0)

            # 比较新更新得到的簇中心,与上一次保留的所有簇中心的欧式距离和,如果这个和小于一个阈值,则跳出循环,算法结束
            dist = np.sqrt(np.sum(np.power(self.last_k_cluster-self.k_cluster, 2)))
            if dist <= self.threhold:
                break

    def predict(self, X):
        # 将X转变为ndarray结构
        X = np.array(X)
        result = np.zeros(len(X))
        for index, point in enumerate(X):
            distance = np.sum(np.power(point - self.k_cluster, 2), axis=1)  # 得益与numpy的广播特性,所以可以这么写
            result[index] = distance.argmin()  # 将点point分为欧式距离的平方和最小的簇下标
        return result

"""测试代码"""
# from KMeans_Shayue import *

if __name__ == '__main__':
    obj = Kmeans(3)
    np.random.seed(10)
    X = np.random.randint(1, 300, (100, 2))
    obj.fit(X)

转载于:https://www.cnblogs.com/shayue/p/Kmeans.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值