机器学习-K近邻_find the k nearest neighbors of the set of vectors-优快云博客

本文深入探讨了K-最近邻(KNN)算法的基本原理，解释了其作为懒惰学习者的特点，并通过Python代码示例展示了如何使用Scikit-Learn库构建KNN模型。文章涵盖了选择邻居数量、距离度量、训练数据集的记忆策略等关键步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Section I: Brief Introduction on K-Nearest Neighbors

K-Nearest neighbors (KNN) is particularly interesting because it is fundamentallyndifferent from the other learning algorithms. KNN is a typical example of a lazy learner. It is called not because of its apparaent simplicity, but because it doesn’t learn a discriminative function from training data, but memorizes the training dataset instead. The KNN algorithm itself is fairly straightforward and can be summarized by the following steps:

Step 1: Choose the number of k and a distance metric
Step 2: Find the k-nearest neighbors of the sample
Step 3: Assign the ckass label by majority vote

From
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京：东南大学出版社，2018.

Section II: Construct K-Nearest Neighbors Model

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from DecisionTrees.visualize_test_idx import plot_decision_regions

plt.rcParams['figure.dpi']=200
plt.rcParams['savefig.dpi']=200
font = {'family': 'Times New Roman',
        'weight': 'light'}
plt.rc("font", **font)

#Section 1: Load data and split it into train/test dataset
iris=datasets.load_iris()
X=iris.data[:,[2,3]]
y=iris.target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=1,stratify=y)

sc=StandardScaler()
sc.fit(X_train)
X_train_std=sc.transform(X_train)
X_test_std=sc.transform(X_test)
X_combined=np.vstack([X_train_std,X_test_std])
y_combined=np.hstack([y_train,y_test])

#Section 2: Train K-Neighbor Model
from sklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=5,p=2,metric='minkowski')
knn.fit(X_train_std,y_train)
plot_decision_regions(X_combined,y_combined,classifier=knn,test_idx=range(105,150))
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.savefig('./fig1.png')
plt.show()