KNN with R

DATACAMP–machine learning with R

Not including the mechanism of knn, this passage focus on how to apply the package “KNN” in R to conduct the classification.
It’s my study notes from DATACAMP. link: https://campus.datacamp.com

Chapter 1: k-Nearest Neighbors (kNN)


Recognizing a road sign with kNN

After several trips with a human behind the wheel, it is time for the self-driving car to attempt the test course alone.
As it begins to drive away, its camera captures the following image:
在这里插入图片描述
Apply a kNN classifier to help the car recognize this sign.
KNN聚类算法:可以用class包中的knn函数进行处理。
Codes:

# Load the 'class' package
library(class)
# Create a vector of labels
sign_types <- signs$sign_type
# Classify the next sign observed
knn(train = signs[-1], test = next_sign, cl = sign_types)

How did the knn() function correctly classify the stop sign?
----The sign was in some way similar to another stop sign


Exploring the traffic sign dataset

To better understand how the knn() function was able to classify the stop sign, it may help to examine the training dataset it used.

Each previously observed street sign was divided into a 4x4 grid, and the red, green, and blue level for each of the 16 center pixels is recorded as illustrated here.
在这里插入图片描述
The result is a dataset that records the sign_type as well as 16 x 3 = 48 color properties of each sign.

codes:

# Examine the structure of the signs dataset
str(signs)

##'data.frame':	146 obs. of  49 variables:
 $ sign_type: chr  "pedestrian" "pedestrian" "pedestrian" "pedestrian" ...
 $ r1       : int  155 142 57 22 169 75 136 149 13 123 ...
 $ g1       : int  228 217 54 35 179 67 149 225 34 124 ...
......
 $ r16      : int  22 164 58 19 160 180 188 237 83 43 ...
 $ g16      : int  52 227 60 27 183 107 211 254 125 29 ...
 $ b16      : int  53 237 60 29 187 26 227 53 19 11 ...


# Count the number of signs of each type
table(signs$sign_type)

## pedestrian      speed       stop 
        46         49         51

# Check r10's average red level by sign type
aggregate(r10 ~ sign_type, data = signs, mean)

## sign_type       r10
1 pedestrian 113.71739
2      speed  80.63265
3       stop 132.39216

Classifying a collection of road signs

Now that the autonomous vehicle has successfully stopped on its own, your team feels confident allowing the car to continue the test course.

The test course includes 59 additional road signs divided into three types:
在这里插入图片描述在这里插入图片描述在这里插入图片描述
At the conclusion of the trial, you are asked to measure the car’s overall performance at recognizing these signs.

# Use kNN to identify the test road signs
sign_types <- signs$sign_type
signs_pred <- knn(train = signs[-1], test = test_signs[-1], cl = sign_types)

# Create a confusion matrix of the predicted versus actual values
signs_actual <- test_signs$sign_type
table(signs_pred, signs_actual)

## signs_actual
signs_pred   pedestrian speed stop
  pedestrian         19     2    0
  speed               0    17    0
  stop                0     2   19

# Compute the accuracy
mean(signs_pred == signs_actual)
#[1] 0.9322034--accurary rate 

How to choose ‘k’? — Try it

Testing other ‘k’ values

By default, the knn() function in the class package uses only the single nearest neighbor.

Setting a k parameter allows the algorithm to consider additional nearby neighbors. This enlarges the collection of neighbors which will vote on the predicted class.

Compare k values of 1, 7, and 15 to examine the impact on traffic sign classification accuracy.

# Compute the accuracy of the baseline model (default k = 1)
k_1 <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types)
mean(k_1==signs_actual)
##[1] 0.9322034
# Modify the above to set k = 7
k_7 <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types,k=7)
mean(k_7==signs_actual)
##[1] 0.9491525
# Set k = 15 and compare to the above
k_15 <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types,k=15)
mean(k_15==signs_actual)
##[1] 0.8813559

Thus, k=7 is the best among 1,7,15.


Seeing how the neighbors voted

When multiple nearest neighbors hold a vote, it can sometimes be useful to examine whether the voters were unanimous or widely separated.

For example, knowing more about the voters’ confidence in the classification could allow an autonomous vehicle to use caution in the case there is any chance at all that a stop sign is ahead.

In this exercise, you will learn how to obtain the voting results from the knn() function.

# Use the prob parameter to get the proportion of votes for the winning class
sign_pred <- knn(train=signs[-1],test=signs_test[-1],cl=sign_types,prob=TRUE,k=7)
sign_pred
# Get the "prob" attribute from the predicted classes
sign_prob <- attr(sign_pred, "prob")
sign_prob
# Examine the first several predictions
head(sign_pred)

Why normalize data?

Before applying kNN to a classification task, it is common practice to rescale the data using a technique like min-max normalization.
What is the purpose of this step?
To ensure all data elements may contribute equal shares to distance.

KNN benefits from the normalized data.
Codes:

normalization <- function(x){
   return((x-min(x))/(max(x)-min(x)))
   }

C’est tout, merci beaucoup.

内容概要:本文档为《400_IB Specification Vol 2-Release-2.0-Final-2025-07-31.pdf》,主要描述了InfiniBand架构2.0版本的物理层规范。文档详细规定了链路初始化、配置与训练流程,包括但不限于传输序列(TS1、TS2、TS3)、链路去偏斜、波特率、前向纠错(FEC)支持、链路速度协商及扩展速度选项等。此外,还介绍了链路状态机的不同状态(如禁用、轮询、配置等),以及各状态下应遵循的规则和命令。针对不同数据速率(从SDR到XDR)的链路格式化规则也有详细说明,确保数据包格式和控制符号在多条物理通道上的一致性和正确性。文档还涵盖了链路性能监控和错误检测机制。 适用人群:适用于从事网络硬件设计、开发及维护的技术人员,尤其是那些需要深入了解InfiniBand物理层细节的专业人士。 使用场景及目标:① 设计和实现支持多种数据速率和编码方式的InfiniBand设备;② 开发链路初始化和训练算法,确保链路两端设备能够正确配置并优化通信质量;③ 实现链路性能监控和错误检测,提高系统的可靠性和稳定性。 其他说明:本文档属于InfiniBand贸易协会所有,为专有信息,仅供内部参考和技术交流使用。文档内容详尽,对于理解和实施InfiniBand接口具有重要指导意义。读者应结合相关背景资料进行学习,以确保正确理解和应用规范中的各项技术要求。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值