数据驱动方法(data-drive approach)
- Collect a dataset of images and labels
- Use dataset to train a classier
- Evaluate the classifier on new images
def train(images, labels):
#使用图片和标签去训练模型
model = []
return model
def predict(model, test_images):
#使用模型去预测新的图片
test_labels = []
return test_labels
最近邻算法(Nearest Neighbor):(用于图片分类)
during train:
模型记忆所有的图片和标签
during test:
用新的图片在训练集中找最相似(距离最小的图片),然后由此得出新图片的标签(predict the label of the most similar training image)
比较图片的相似性是用L1距离(或曼哈顿距离)表示:
像素间绝对值的总和

缺点:训练过程快,而测试过程慢(但我们想要的是:训练过程慢,测试过程快)
import numpy as np
class NearestNeighbor:
def __init__(self):
pass
def train(self, X, y):
‘‘‘ X is N x D where each row is an example. Yis 1-d of size N ’’’
# the nearest neighbor classifier simply remebers all the tarining data
self.Xtr = X
self.ytr = y
def predict(self X):
’‘’ X is N x D where each row is an example wish to predict label for ‘’‘
num_test = X.shape[0]
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i*th test image
# using the L1 distance (sum of absolute value differences)
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1) # using broadcast
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
return Ypred
代码部分转载自:https://github.com/zhuole1025/cs231n/blob/master/CS231n-notes.md
本文介绍了如何通过数据驱动方法训练和评估图像分类器,重点讲解了最近邻算法的工作原理,包括其训练和测试阶段的特点,以及如何利用L1距离进行图片相似性判断。同时探讨了最近邻算法的测试效率问题,并提供了一个Python实现示例。
1256

被折叠的 条评论
为什么被折叠?



