cs231n笔记（数据驱动与最近邻算法）

最新推荐文章于 2024-01-08 01:29:44 发布

原创最新推荐文章于 2024-01-08 01:29:44 发布 · 212 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #机器学习 #python #人工智能 #算法

CS231n笔记专栏收录该内容

9 篇文章

订阅专栏

本文介绍了如何通过数据驱动方法训练和评估图像分类器，重点讲解了最近邻算法的工作原理，包括其训练和测试阶段的特点，以及如何利用L1距离进行图片相似性判断。同时探讨了最近邻算法的测试效率问题，并提供了一个Python实现示例。

数据驱动方法（data-drive approach）

Collect a dataset of images and labels
Use dataset to train a classier
Evaluate the classifier on new images

def train(images, labels):
    #使用图片和标签去训练模型
    model = []
    return model

def predict(model, test_images):
	#使用模型去预测新的图片
    test_labels = []
    return test_labels

最近邻算法（Nearest Neighbor）：(用于图片分类)
during train：
模型记忆所有的图片和标签
during test：
用新的图片在训练集中找最相似（距离最小的图片），然后由此得出新图片的标签(predict the label of the most similar training image)

比较图片的相似性是用L1距离（或曼哈顿距离）表示：
像素间绝对值的总和
L1在这里插入图片描述
缺点：训练过程快，而测试过程慢（但我们想要的是：训练过程慢，测试过程快）

import numpy as np

class NearestNeighbor:
  def __init__(self):
    pass
  
  def train(self, X, y):
    ‘‘‘ X is N x D where each row is an example. Yis 1-d of size N ’’’
    # the nearest neighbor classifier simply remebers all the tarining data
    self.Xtr = X
    self.ytr = y
    
  def predict(self X):
    ’‘’ X is N x D where each row is an example wish to predict label for ‘’‘
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
    
    # loop over all test rows
    for i in xrange(num_test):
      # find the nearest training image to the i*th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1) # using broadcast
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
      
    return Ypred