机器学习算法——K邻近算法

最新推荐文章于 2021-10-03 19:52:38 发布

原创最新推荐文章于 2021-10-03 19:52:38 发布 · 901 阅读

0 ·

CC 4.0 BY-SA版权

机器学习专栏收录该内容

5 篇文章

订阅专栏

本文介绍了一种简单直观的分类算法——K近邻算法，并通过Python代码实现了该算法。使用了numpy库进行数学运算，operator模块对结果进行排序，通过一个具体的例子展示了如何利用K近邻算法对未知数据进行分类。

#-*-coding=utf-8-*-
__author__ = 'whf'
from numpy import *
import operator
def classify (inx,dataSet,labels,k):
    #得到数据集的行数  shape方法用来得到矩阵或数组的维数
    dataSetSize = dataSet.shape[0]
    #tile:numpy中的函数。tile将原来的一个数组，扩充成了dataSetSize行1列的数组。diffMat得到了目标与训练数值之间的差值。
    diffMat = tile(inx,(dataSetSize,1))-dataSet
    #计算差值的平方
    sqDiffMat = diffMat**2
    #计算差值平方和
    sqDistances = sqDiffMat.sum(axis = 1)
    #计算距离
    distances = sqDistances**0.5
    #得到排序后坐标的序号  argsort方法得到矩阵中每个元素的排序序号
    sortedDistIndicies = distances.argsort()
    classcount = {}
    for i in range(k):
        #找到前k个距离最近的坐标的标签
        voteIlabel = labels[sortedDistIndicies[i]]
        #在字典中设置键值对： 标签：出现的次数
        classcount [voteIlabel] = classcount.get(voteIlabel,0)+1 #如果voteIlable标签在classcount中就得到它的值加1否则就是0+1
    # 对字典中的类别出现次数进行排序，classCount中存储的事 key-value，其中key就是label，value就是出现的次数
    # 所以key=operator.itemgetter(1)选中的是value，也就是对次数进行排序 reverse = True表示降序排列
    sortedClassCount = sorted(classcount.iteritems(),key=operator.itemgetter(1),reverse=True)
    return sortedClassCount[0][0]
group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
labels = ['A','A','B','B']
print classify([0.1,0.1],group,labels,3)