sklearn.neighbors常用API介绍

本文详细介绍了scikit-learn库中用于实现K近邻算法的API,包括NearestNeighbors类的功能,如fit、kneighbors、kneighbors_graph等方法的参数与用法,并给出了实例演示如何使用KDTree进行快速查询。此外,还提及了KNeighborsClassifier和KNeighborsRegressor在分类和回归任务中的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

K近邻算法:
对新的输入数据,在训练数据上找到与该实例最邻近的k个实例,k个实例的多数属于的类别作为输入数据的类别。
用于分监督学习的k近邻算法:
sklearn.neighbors.NearestNeighbors(n_neighbors=5,radius=1.0,algorithm='auto',leaf_size=30,
metric='minkowski',p=2,metric_params=None,n_jobs=1,**kwargs)
n_neighbors:int,默认为5
对输入数据进行投票的训练数据个数,即k的大小
radius:float,默认1.0
radius_neighbors查询时默认的参数空间范围,即半径。给定目标点及半径r,在目标点为圆心,r为半径的圆中的点距目标点更近
algorithm:{'auto','ball_tree','kd_tree','brute'}
计算最近邻使用的算法,输入为稀疏表示时会强制使用brute
leaf_size:int,默认30
BallTree或KDTree的叶节点数,即子区域个数。会影响树结构建立和查询的速度以及耗费的内存。
metric:string或函数调用,默认'minkowsik'。

计算距离的方式。可以使用scikit-learn或者scipy.spatial.distance定义个任意方式.
p:int,默认2
minkowski的参数。p=1时为曼哈顿距离,p=2为欧式距离。
metric_params:字典,默认None。对metric的追加参数
n_jobs:int ,默认1
并行计算的工作数,即CPU的占用个数
方法:
fit(X[,y])
X:{array-like,sparse matrix,BallTree,KDTree}
训练数据,X为array 或matrix,shape(n_samples,n_features)。或者直接传入构建完成的数结构
get_params(deep=True):获取模型的参数映射关系
kneighbors(X=None,n_neighbors=None,return_distance=True)
寻找点的k个最近邻,返回每个最近邻点的索引和距离。
X:array-like,shape(n_query,n_features)。
n_neighbors:int.获得的最近邻点个数,默认是构造函数的n_neighb
为以下代码添加归一化 import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt iris = pd.read_csv('/iris/iris.csv') iris.head() iris.info() iris.drop('Id',axis=1,inplace=True) iris.info() fig = iris[iris.Species=='Iris-setosa'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='orange',label='setosa') iris[iris.Species=='Iris-versicolor'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='blue',label='versicolor',ax=fig) iris[iris.Species=='Iris-virginica'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='green',label='virginica',ax=fig) fig.set_xlabel("Sepal Length") fig.set_ylabel("Sepal Width") fig.set_title("Sepal Length VS Width") fig=plt.gcf() fig.set_size_inches(10,6) plt.show() fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange',label='setosa') iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue',label='versicolor',ax=fig) iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green',label='virginica',ax=fig) fig.set_xlabel("Petal Length") fig.set_ylabel("Petal Width") fig.set_title("Petal Length VS Width") fig=plt.gcf() fig.set_size_inches(10,6) plt.show() iris.hist(edgecolor='black',linewidth=1.2) fig=plt.gcf() fig.set_size_inches(12,6) plt.show() plt.figure(figsize=(15,10)) plt.subplot(2,2,1) sns.violinplot(x='Species',y='PetalLengthCm',data=iris) plt.subplot(2,2,2) sns.violinplot(x='Species',y='PetalWidthCm',data=iris) plt.subplot(2,2,3) sns.violinplot(x='Species',y='SepalLengthCm',data=iris) plt.subplot(2,2,4) sns.violinplot(x='Species',y='SepalWidthCm',data=iris) plt.show() from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn import svm from sklearn import metrics from sklearn.tree import DecisionTreeClassifier iris.shape plt.figure(figsize=(7,4)) numeric_columns = iris.select_dtypes(include=['float64', 'int64']) sns.heatmap(numeric_columns.corr(), annot=True, cmap='cubehelix_r') plt.show() train, test = train_test_split(iris, test_size = 0.2) print(train.shape) print(test.shape) train_X = train[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']] train_y=train.Species test_X= test[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']] test_y =test.Species train_X.head() test_X.head() train_y.head() model = svm.SVC() model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the SVM is:',metrics.accuracy_score(prediction,test_y)) model = LogisticRegression() model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the Logistic Regression is',metrics.accuracy_score(prediction,test_y)) model=DecisionTreeClassifier() model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the Decision Tree is',metrics.accuracy_score(prediction,test_y)) model=KNeighborsClassifier(n_neighbors=3) model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the KNN is',metrics.accuracy_score(prediction,test_y)) a_index=list(range(1,11)) a=pd.Series(dtype='float64') x=[1,2,3,4,5,6,7,8,9,10] for i in list(range(1,11)): model=KNeighborsClassifier(n_neighbors=i) model.fit(train_X,train_y) prediction=model.predict(test_X) a = pd.concat([a, pd.Series(metrics.accuracy_score(prediction, test_y))]) plt.plot(a_index, a) plt.xticks(x) petal=iris[['PetalLengthCm','PetalWidthCm','Species']] sepal=iris[['SepalLengthCm','SepalWidthCm','Species']] train_p,test_p=train_test_split(petal,test_size=0.3,random_state=0) train_x_p=train_p[['PetalWidthCm','PetalLengthCm']] train_y_p=train_p.Species test_x_p=test_p[['PetalWidthCm','PetalLengthCm']] test_y_p=test_p.Species train_s,test_s=train_test_split(sepal,test_size=0.3,random_state=0) train_x_s=train_s[['SepalWidthCm','SepalLengthCm']] train_y_s=train_s.Species test_x_s=test_s[['SepalWidthCm','SepalLengthCm']] test_y_s=test_s.Species model=svm.SVC() model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the SVM using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model=svm.SVC() model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the SVM using Sepal is:',metrics.accuracy_score(prediction,test_y_s)) model = LogisticRegression() model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the Logistic Regression using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the Logistic Regression using Sepals is:',metrics.accuracy_score(prediction,test_y_s)) model=DecisionTreeClassifier() model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the Decision Tree using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the Decision Tree using Sepals is:',metrics.accuracy_score(prediction,test_y_s)) model=KNeighborsClassifier(n_neighbors=3) model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the KNN using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the KNN using Sepals is:',metrics.accuracy_score(prediction,test_y_s))
最新发布
06-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值