k最近邻

K-Nearest Neighbors 
该算法存储所有的训练样本(已知标签),然后通过分析新给的样本(标签未知)与已知标签的训练样本的相似度,选出其中的K个最相似的训练样本进行投票得到新样本的标签,并计算加权和等。 该方法有时被称为是“learning by example”,因为他总是根据新样本的特征向量与已知标签的样本特征向量的相似度来判断新样本的类别。

CvKNearest 
class CvKNearest : public CvStatModel 
该类实现了 K-Nearest Neighbors 模型

CvKNearest::CvKNearest

构造函数

默认构造函数. 

Default and training constructors.
C++: CvKNearest::CvKNearest()
C++: CvKNearest::CvKNearest(const Mat& trainData, const Mat& responses, const Mat& sampleIdx=Mat(), bool isRegression=false, int max_k=32 )
C++: CvKNearest::CvKNearest(const CvMat* trainData, const CvMat* responses, const CvMat* sampleIdx=0, bool isRegression=false, int max_k=32 )

训练函数
CvKNearest::trainC++: bool CvKNearest::train(const Mat& trainData, const Mat& responses, const Mat& sampleIdx=Mat(), bool isRegression=false, int maxK=32, bool updateBase=false )
C++: bool CvKNearest::train(const CvMat* trainData, const CvMat* responses, const CvMat* sampleIdx=0, bool is_regression=false, int maxK=32, bool updateBase=false)
Python: cv2.KNearest.train(trainData, responses[, sampleIdx[, isRegression[, maxK[, updateBase]]]]) ---> retval

参数:

    isRegression – Type of the problem: true for regression and false for classification.
    maxK – Number of maximum neighbors that may be passed to the method CvKNearest::find_nearest()
    updateBase – Specifies whether the model is trained from scratch (update_base=false), or it is updated using the new training data (update_base=true). In the latter case, the parameter maxK must not be larger than the original value.

The method trains the K-Nearest model. It follows the conventions of the generic CvStatModel::train() approach with the following limitations:

    • Only CV_ROW_SAMPLE data layout is supported.
    • Input variables are all ordered.
    • Output variables can be either categorical (
is_regression=false ) or ordered ( is_regression=true ).

    • Variable subsets (var_idx) and missing measurements are not supported.


找到邻居并预测输入向量的响应

CvKNearest::find_nearest

Finds the neighbors and predicts responses for input vectors.
C++: float CvKNearest::find_nearest(const Mat& samples, int k, Mat* results=0, const float** neighbors=0, Mat* neighborResponses=0, Mat* dist=0 ) const
C++: float CvKNearest::find_nearest(const Mat& samples, int k, Mat& results, Mat& neighborResponses, Mat& dists) const
C++: float CvKNearest::find_nearest(const CvMat* samples, int k, CvMat* results=0, const float** neighbors=0, CvMat* neighborResponses=0, CvMat* dist=0) const
Python: cv2.KNearest.find_nearest(samples, k[, results[, neighborResponses[, dists]]]) ----> retval, results, neighborResponses, dists

Parameters
samples
– Input samples stored by rows. It is a single-precision floating-point matrix of number_of_samples × number_of_features size
k – Number of used nearest neighbors. 
results – Vector with results of prediction (regression or classification) for each input sample. It is a single-precision floating-point vector with number_of_samples elements.
neighbors – Optional output pointers to the neighbor vectors themselves. It is an array of k*samples->rows pointers.
neighborResponses – Optional output values for corresponding neighbors. It is a singleprecision floating-point matrix of number_of_samples × k size.
dist – Optional output distances from the input vectors to the corresponding neighbors. It is a single-precision floating-point matrix of number_of_samples × k size.
For each input vector (a row of the matrix
samples), the method finds the k nearest neighbors. In case of regression,
the predicted result is a mean value of the particular vector’s neighbor responses. In case of classification, the class is
determined by voting.
For each input vector, the neighbors are sorted by their distances to the vector.
In case of C++ interface you can use output pointers to empty matrices and the function will allocate memory itself.
If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method.
The function is parallelized with the TBB library.


例程:

Ptr<ml::KNearest>  knn(ml::KNearest::create());
Mat_<float> trainFeatures(6,4);
trainFeatures << 2,2,2,2,
                 3,3,3,3,
                 4,4,4,4,
                 5,5,5,5,
                 6,6,6,6,
                 7,7,7,7;

Mat_<int> trainLabels(1,6);
trainLabels << 2,3,4,5,6,7;

knn->train(trainFeatures, ml::ROW_SAMPLE, trainLabels);

Mat_<float> testFeature(1,4);
testFeature<< 3,3,3,3;

int K=1;
Mat response,dist;
knn->findNearest(testFeature, K, noArray(), response, dist);
cerr << response << endl;
cerr << dist<< endl;

### K最近邻算法 (K-Nearest Neighbor Algorithm) K最近邻算法(KNN)是一种基于实例的学习方法,属于懒惰学习(lazy learning),其核心思想是在分类过程中仅对局部函数进行近似估计,并将所有计算延迟至分类阶段完成[^1]。作为一种非常基础的机器学习算法,KNN以其简洁性和高效性著称。 #### 基本原理 KNN的工作机制依赖于数据点之间的相似度测量。对于一个新的输入样本,该算法会根据选定的距离度量方式,在训练集中找到与其最接近的 \(k\) 个邻居。随后,依据这些邻居所属类别的情况来决定新样本的分类标签。通常采用多数投票原则来进行最终决策[^3]。 #### 特点分析 ##### 优势 - **高精度**:由于直接利用已知的数据点进行推断,因此能够达到较高的预测准确率。 - **鲁棒性强**:对噪声和异常值具有较强的容忍能力,不会轻易受到个别极端值的影响。 - **无需显式训练**:与其他复杂的建模技术不同的是,KNN并不涉及参数调整或模型拟合的过程[^2]。 ##### 劣势 - **计算开销大**:当面对大规模数据集时,寻找最近邻所需的时间成本显著上升,因为每次都需要重新评估目标点与整个训练集合间的关系。 - **存储需求高**:为了支持实时查询操作,必须保存全部历史记录,这无疑增加了内存占用的压力。 - **维度灾难问题**:随着特征数量增多,欧氏距离等传统衡量标准的有效性可能会下降,从而影响整体性能表现[^4]。 #### 实现细节 以下是使用Python实现的一个基本版本KNN: ```python from collections import Counter import numpy as np def knn_predict(X_train, y_train, X_new, k=3): distances = [] # 计算每一个训练样例到新样例的距离 for i in range(len(X_train)): distance = euclidean_distance(X_train[i], X_new) distances.append((distance, y_train[i])) # 对距离列表按照第一个元素升序排列并取前k项 sorted_distances = sorted(distances)[:k] labels = [label for _, label in sorted_distances] prediction = Counter(labels).most_common(1)[0][0] return prediction def euclidean_distance(x1, x2): return np.sqrt(np.sum((x1-x2)**2)) ``` 此代码片段定义了一个简单的`knn_predict`函数用于执行分类任务以及辅助性的欧式距离计算器`euclidean_distance`。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值