k-近邻算法(kNN):如果一个样本在特征空间的k个最相似(即特征空间中最近邻)的样本大多数属于某一类别,则该样本也属于这一类别。
- 样本的最近邻是根据欧氏距离公式定义的
import numpy as np
import operator
def knn(test, trains, labels, k):
trainsSize = trains.shape[0]
values = np.tile(test, (trainsSize, 1)) - trains
sqValues = values ** 2
sqDistances = sqValues.sum(axis = 1)
distances = sqDistances ** 0.5
sortedDists = np.argsort(distances)
classCount = {}
for i in range(k):
voteLabel = labels[sortedDists[i]]
classCount[voteLabel] = classCount.get(voteLabel,0) + 1
classCount = classCount.items()
sortClassCount = sorted(classCount, key=operator.itemgetter(1), reverse=True)
return sortClassCount[0][0]
trains = [[3 , 104],
[2 , 100],
[1 , 81],
[101 , 10],
[99 , 5],
[98 , 2]]
# ‘1’表示爱情片,‘2’表示动作片
labels = [1 , 1 , 1 , 2 , 2, 2]
test = [18 , 90]
k = 3
trains = np.array(trains)
labels = np.array(labels)
test = np.array(test)
print('测试样本的类别:' + str(knn(test, trains, labels, k)))