K-nearest neighbors and Euclidean Distance

最新推荐文章于 2024-07-07 17:26:09 发布

阅读量348

点赞数

分类专栏： Machine Learning 文章标签：机器学习数学基础

Machine Learning 专栏收录该内容

3 篇文章

订阅专栏

本文探讨了K-最近邻(KNN)算法在机器学习分类中的应用，解释了K值选择的重要性及其对预测准确性和置信度的影响，并介绍了作为算法核心的欧氏距离计算方法。

K-nearest neighbors and Euclidean Distance

This is my study notes in machine learning, writing articles in English because I want to improve my writing skills. Anyway, thanks for watching and if I made some mistakes, let me know please.

What is K-nearest neighbors algorithm?

It is a supervised learning algorithm in classification. We have prior-labeled data for training, telling the machine which data belongs to which group. Clustering is the other algorithm in classification but it is unsupervised learning method.
The algorithm bases on distances between predicted data and trained data which are knew before. Distance is also understood as proximity intuitively.

What the K and nearest means?

K is a number we can choose, which symbols how many data points we choose that nearest to the new data. Usually we want to use an odd number as K because this algorithm is going to basically go to a majority vote based on the neighbors. If we use even number, we may get into trouble of 50/50 split situation. There many ways to apply weights to distance to penalize greater distances data, we may use even number for K.

Accuracy or Confidence

During prediction process, the algorithm will select K points which closest to the new data point, and then, find out the largest categories(classification), it means probably this data belongs to this group. The rate of ’positive number / K’ stands for confidence, which means how much we can trust this data belongs to this group. Accuracy is used in testing model after training. They are completely different.

Euclidean Distance

coming soon