stanford_CS231n_learning note_Lec_02 Image Classification pipeline

最新推荐文章于 2022-08-13 19:53:58 发布

原创最新推荐文章于 2022-08-13 19:53:58 发布 · 1.1k 阅读

0 ·

CC 4.0 BY-SA版权

CS231n 专栏收录该内容

6 篇文章

订阅专栏

本文介绍了图像分类的基础知识及KNN算法的应用，探讨了L1与L2距离度量方式的选择，以及如何通过线性分类器进行图像分类。

引言

课程配套Python Numpy Tutorial
http://cs231n.github.io/python-numpy-tutorial/
图像分类时物体检测、图像场景分类等问题的基础，搞懂这个后面的都容易。
图像分类的挑战：视角变动、光线、形变、遮挡、背景杂斑、同类演变
本节课所学知识：能识别出数千类东西，并使用实时技术，在数毫秒内完成识别—KNN
CIFAR-10 dataset
http://www.cs.toronto.edu/~kriz/cifar.html

这里写图片描述

第一个分类器：Nearest Neighbor Classifier

如何比较图片？如何定义距离？
L1距离： $d_1(I_1,I_2)=\sum_P|I_1^P-I_2^P|$
图像做差求和[test image] - [training image]=[image diff] $\xrightarrow{sum}$ [The final value]
缺点：待预测的图片需要与所有训练集图片作差求和运算，预测图像的分类速度取决于训练集的大小，故此需要大量的计算，与实践中的及时性相悖。
CNN在训练上花费了大量实践，在实践中却可以很快的解决图像分类问题。
KK改进算法 $\rightarrow$ ANN(2010),由L1(Manhattan distance) $\rightarrow$ L2 (Euclidean) distance
L2距离： $d_2(I_1,I_2)=\sqrt{(I_1^p-I_2^P)^2}$
KNN:找到与预测图像最相近的k个图像，然后做投票表决，随着N的增大，图像变平滑
Question-1：当使用L2距离时的精度如何？
Question-2：kNN在训练集上的精度如何？如何呢？
Question-3：选取哪个距离最好？
Question-4：最好的k值如何确定？

Answers: Very problem-dependent.Must try them all out and see what works best.

测试集是算法泛化程度的代表，训练集上效果好的数据，如果在测试集上效果不好，说泛化能力不行，即训练模型过拟合
将所有数据划分为三部分：训练集+验证集+测试集
确定参数的方法：K-folds Cross Validation
绘制K-Accuracy图像
然而K-NN算法却从未在图像分类中使用
原因有二：测试集上timecost+距离指标不具有启发意义(说人话：分类预测效果不好)

Summary

Image Classification: We are given a Training Set of labeled images, asked
to predict labels on Test Set. Common to report the Accuracy of predictions
(fraction of correctly predicted images)
We introduced the k-Nearest Neighbor Classifier, which predicts the labels
based on nearest images in the training set
We saw that the choice of distance and the value of k are hyperparameters
that are tuned using a validation set, or through cross-validation if the size
of the data is small.
Once the best set of hyperparameters is chosen, the classifier is evaluated
once on the test set, and reported as the performance of kNN on that data.

第二个分类器： Linear Classification

神经网络就是搭积木
助理老师的大论文：图像描述
简单介绍：CNN用于视觉识别+RNN用于排序
KNN：非参数化 $\rightarrow$ LC：参数化的方法
线性函数 $\rightarrow$ 神经网络 $\rightarrow$ 卷积神经网络
[32x32x3]images=3072 numbers $\rightarrow$ f(x,W) $\rightarrow$ 10 numbers indicating class scores
$f(x,W)_{10\times1} = W_{10\times3072}x_{3072\times1}(+b)_{10\times1}$
b不是图像的函数，它是独立的权重参数
Question-1: what does the linear classifier do, in English?
所有分数都是一个所有像素点的加权和，实际上式在计算不同空间位置的颜色之和
$f(x_i,W,b) = Wx_i+b$ 将图像带入训练好的f并重新输出图像
想象将图像所构成的3072个点描绘在高维空间；0直线上对对应的类别得分为0。沿梯度箭头方向的得分将会增高
至此我们建立了一个得分函数score function： $f(x_i,W,b) = Wx_i+b$ ，定义得分最高的类别为图像的预测类别