机器学习之Knn算法2

写了knn算法以后,再写一个二,
一的连接http://blog.youkuaiyun.com/liuqiao18434391822/article/details/78079018
本次测试的是irisdata.txt的结果,大家可以去直接下载该测试集,这是一个开放的机器学习测试集合
算法的大概步骤如下:
1、导入数据,生成训练集和测试集
2、计算测试实例和训练实例的距离,并返回,这里用的欧式距离算法,即sqrt((a1-b1)^2+(a2-b2)^2),可以推广到多维向量
3、调用2的函数,找出测试数据和训练数据最相近的k个训练集,并返回
4、根据3返回的k个实例,找到label出现次数最多的label,返回,出现最多的label即为我们预测出来的分类结果

#-*- coding:utf-8 -*-
#
import math
import random
import operator
import csv
import numpy as np

def loadDataset(filename,split,trainingSet = [],testSet = []):
    with open(filename,'rb') as csvfile:
        lines = csv.reader(csvfile)  #csv读入所有的行
        dataset = list(lines)   #转换为列表
        for x in range(len(dataset)-1):  #转化为csv后多出来一个空[],所以要-1
            for i in range(4):
                dataset[x][i] = float(dataset[x][i])
            if random.random() <split:    #随机分割训练集和测试集
                trainingSet.append(dataset[x])
            else:
                testSet.append(dataset[x])
        return trainingSet,testSet


# loadDataset('irisdata.txt',0.5,trainingSet = [],testSet = [])
# 返回两个实例之间的距离
def euclideanDistance(instance1,instance2):
    n = len(instance1)-1
    instance1 = np.array(instance1[:n])
    instance2 = np.array(instance2[:n])
    distance = (instance2-instance1)**2
    distances = distance.sum(axis = 0)
    return math.sqrt(distances)


#返回k个最近的实例集合
def getNeaborhod(testSet,trainset,k):
    distances = []
    for i in range(len(trainset)):
        distance = euclideanDistance(testSet,trainset[i])
        distances.append(distance)
    distances = np.array(distances)    #转化为np.array()类型,为了方便使用argsort,返回从小到大的元素的下标
    distancesSort = distances.argsort()
    neighbors = []
    for x in range(k):
        neighbors.append(trainset[distancesSort[x]])
    return neighbors    #返回最近的k个对象 

#找到最近的label
def getResponse(neighbors):
    classVotes = {}
    for x in range(len(neighbors)):
        label = neighbors[x][-1]
        classVotes[label] = classVotes.get(label,0)+1
    result = sorted(classVotes.iteritems(),key = operator.itemgetter(1),reverse=True)
    return result[0][0] 


def main():
    trainSet,testSet = loadDataset('irisdata.txt',0.8,trainingSet = [],testSet = [])   #80作为训练集
    print 'trainSet1',len(trainSet)
    print 'testSet1',len(testSet)
    n = 0.0
    for i in range(len(testSet)):
        test = testSet[i]
        neighbor = getNeaborhod(test,trainSet,3)
        predict = getResponse(neighbor)
        if predict == test[-1]:
            n = n+1
    allSet = float(len(testSet))
    result = n/allSet
    print result 


trainSet,testSet = loadDataset('irisdata.txt',0.8,trainingSet = [],testSet = [])
print 'trainSet2',len(trainSet)
print 'testSet2',len(testSet)
a = [6.1,2.8,4.7,1.2]

def predict(new,train):
    neighbor = getNeaborhod(new,train,3)
    predict = getResponse(neighbor)
    print predict



if __name__ == '__main__':
    main()
    predict(a,testSet)

测试结果还算是比较满意,达到了识别率95%
测试结果如下:

trainSet2 112
testSet2 38
trainSet1 109
testSet1 41
0.951219512195
Iris-versicolor
[Finished in 1.5s]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值