机器学习-K近邻

本文深入探讨了K-最近邻(KNN)算法的基本原理,解释了其作为懒惰学习者的特点,并通过Python代码示例展示了如何使用Scikit-Learn库构建KNN模型。文章涵盖了选择邻居数量、距离度量、训练数据集的记忆策略等关键步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Section I: Brief Introduction on K-Nearest Neighbors

K-Nearest neighbors (KNN) is particularly interesting because it is fundamentallyndifferent from the other learning algorithms. KNN is a typical example of a lazy learner. It is called not because of its apparaent simplicity, but because it doesn’t learn a discriminative function from training data, but memorizes the training dataset instead. The KNN algorithm itself is fairly straightforward and can be summarized by the following steps:

  • Step 1: Choose the number of k and a distance metric
  • Step 2: Find the k-nearest neighbors of the sample
  • Step 3: Assign the ckass label by majority vote

From
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京:东南大学出版社,2018.

Section II: Construct K-Nearest Neighbors Model
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from DecisionTrees.visualize_test_idx import plot_decision_regions

plt.rcParams['figure.dpi']=200
plt.rcParams['savefig.dpi']=200
font = {'family': 'Times New Roman',
        'weight': 'light'}
plt.rc("font", **font)

#Section 1: Load data and split it into train/test dataset
iris=datasets.load_iris()
X=iris.data[:,[2,3]]
y=iris.target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=1,stratify=y)

sc=StandardScaler()
sc.fit(X_train)
X_train_std=sc.transform(X_train)
X_test_std=sc.transform(X_test)
X_combined=np.vstack([X_train_std,X_test_std])
y_combined=np.hstack([y_train,y_test])

#Section 2: Train K-Neighbor Model
from sklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=5,p=2,metric='minkowski')
knn.fit(X_train_std,y_train)
plot_decision_regions(X_combined,y_combined,classifier=knn,test_idx=range(105,150))
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.savefig('./fig1.png')
plt.show()

在这里插入图片描述
备注
plot_decision_regions如果不作特别说明,均为机器学习-感知机(Perceptron)-Scikit-Learn中的plot_decision_regions函数,链接为:机器学习-感知机(Perceptron)-Scikit-Learn

参考文献
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京:东南大学出版社,2018.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值