•Given training data
(X(1),D(1)), (X(2),D(2)), …, (X(N),D(N))
•Define a distance metric between points
in inputs space. Common measures are:
Euclidean Distance
Given test point X
•Find the K nearest training inputs to
X
•Denote these points as
(X(1),D(1)), (X(2), D(2)), …, (X(k), D(k))
•The class identification of X
Y = most common class in set {D(1), D(2), …, D(k)}
–Use N fold cross validation – Pick K to
minimize the cross validation error
(N折交叉验证,找到找到使错误率最小的k)
–For each of N training example
(对于每一轮训练执行以下步骤)
•Find its K nearest neighbours
•Make a classification based on these K
neighbours
•Calculate classification error
•Output average error over all examples
–Use the K that gives lowest average error
over the N training examples