2 图像分类方法
作业题:https://cs231n.github.io/assignments2018/assignment1/
工具:Python + Numpy
Numpy的相关资料:Python之Numpy详细教程
2.1 数据驱动方法
图像分类算法是需要大量的数据,是数据驱动的方法。
数据集:CIFAR-10 50000训练图片 32323 10000测试图片
第一个分类器:最邻近算法(Nearest Neighbor)
训练时记录所有的数据和标签,预测时,找到与待预测的数据的最相似的训练图片,其标签就是待预测的数据的标签。
**临近的标准:**L1距离和L2距离
代码如下:
import nmpy as ap
class NearestNeighbor:
def __init__(self):
pass
def train(self,X,y):
#存储训练数据
self.Xtr = X
self.ytr = y
def predict(self,X):
num_test = X.shape[0]
Ypred = np.zeros(num_test, dtpye = self.ytr.dtype)
for i in range(nm_test):
diatance = np.sum(np.abs(self.Xtr - X[i,:]), axis =1)
min_index = np.argmin(distance)
Ypred[i] = self.ytr[min_index]
return Ypred
2.2 KNN
NN算法对噪声的鲁棒性太差,所以有了KNN算法,选取最临近的K个点,采用投票的方式决定待预测数据的标签。K的值越大,边界越平滑。
超参: K和最佳距离
KNN总结: KNN不适用于图像分类,没有考虑到图像像素之间的距离的信息,而且存在维数的灾难问题。
2.3 线性分类器
神经网络比作玩乐高,可以将不同种类组件组合在一起,构建大型的卷积神经网络,线性分类器是基本的组件。
x是一个长向量,b是偏置项,它不与训练集数据交互,而只会给我们一些数据独立的偏好值。例如,当数据集中猫的数量多于狗时,则与猫对应的偏差元素就会比其他的高。
线性分类器是一种模板匹配的方法,线性分类器只允许每个类别只能学习一个模板,如果这个类别出现了某种类型的变体,那么它将尝试求去所有不同变体的均值,并且只使用一个单独的模板来识别其中的每一个类别。对于神经网络和其他更复杂的模型,没有以上的限制,因此能达到更高的准确率。
线性分类器的另一个观点是回归到图像,作为点和高维空间的概念。它在先行决策边界上尝试画一个线性分类面来决策类别
线性分类过程中无法解决的问题:
- 奇数偶数划分问题
- 多分类问题
- 有多模态数据,例如一个类别出现在不同的领域空间中
例子:
2.4 作业
KNN
import numpy as np
class KNearestNeighbor(object):
""" a kNN classifier with L2 distance """
def __init__(self):
pass
def train(self, X, y):
"""
Train the classifier. For k-nearest neighbors this is just
memorizing the training data.
Inputs:
- X: A numpy array of shape (num_train, D) containing the training data
consisting of num_train samples each of dimension D.
- y: A numpy array of shape (N,) containing the training labels, where
y[i] is the label for X[i].
"""
self.X_train = X
self.y_train = y
def predict(self, X, k=1, num_loops=0):
"""
Predict labels for test data using this classifier.
Inputs:
- X: A numpy array of shape (num_test, D) containing test data consisting
of num_test samples each of dimension D.
- k: The number of nearest neighbors that vote for the predicted labels.
- num_loops: Determines which implementation to use to compute distances
between training points and testing points.
Returns:
- y: A numpy array of shape (num_test,) containing predicted labels for the
test data, where y[i] is the predicted label for the test point X[i].
"""
if num_loops == 0:
dists = self.compute_distances_no_loops(X)
elif num_loops == 1:
dists = self.compute_distances_one_loop(X)
elif num_loops == 2:
dists = self.compute_distances_two_loops(X)
else:
raise ValueError('Invalid value %d for num_loops' % num_loops)
return self.predict_labels(dists, k=k)
def compute_distances_two_loops(self, X):
"""
Compute the distance between each test point in X and each training point
in self.X_train using a nested loop over both the training data and the
test data.
Inputs:
- X: A numpy array of shape (num_test, D) containing test data.
Returns:
- dists: A numpy array of shape (num_test, num_train) where dists[i, j]
is the Euclidean distance between the ith test point and the jth training
point.
"""
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
for j in range(num_train):
#####################################################################
# TODO: #
# Compute the l2 distance between the ith test point and the jth #
# training point, and store the result in dists[i, j]. You should #
# not use a loop over dimension. #
#####################################################################
#相减,平方,求和,开方
dis = X[i,:]-self.X_train[j,:]
distance = np.sqrt(np.sum(dis**2))
dists[i,j]=distance
#####################################################################
# END OF YOUR CODE #
#####################################################################
return dists
def compute_distances_one_loop(self, X):
"""
Compute the distance between each test point in X and each training point
in self.X_train using a single loop over the test data.
Input / Output: Same as compute_distances_two_loops
"""
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
#######################################################################
# TODO: #
# Compute the l2 distance between the ith test point and all training #
# points, and store the result in dists[i, :]. #
#######################################################################
dists[i, :] = np.sqrt(np.sum(np.square(self.X_train - X[i, :]), axis=1))
#######################################################################
# END OF YOUR CODE #
#######################################################################
return dists
def compute_distances_no_loops(self, X):
"""
Compute the distance between each test point in X and each training point
in self.X_train using no explicit loops.
Input / Output: Same as compute_distances_two_loops
"""
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
#########################################################################
# TODO: #
# Compute the l2 distance between all test points and all training #
# points without using any explicit loops, and store the result in #
# dists. #
# #
# You should implement this function using only basic array operations; #
# in particular you should not use functions from scipy. #
# #
# HINT: Try to formulate th e l2 distance using matrix multiplication #
# and two broadcast sums. #
#########################################################################
sq_train = np.sum(np.square(self.X_train), axis=1)
sq_test = np.sum(np.square(X), axis=1)
mul = np.multiply(np.matmul(X, self.X_train.T), -2)
sq_train = np.reshape(sq_train, (1, sq_train.shape[0]))
sq_test = np.reshape(sq_test, (sq_test.shape[0], 1))
dists = mul + sq_train + sq_test
dists = np.sqrt(dists)
#########################################################################
# END OF YOUR CODE #
#########################################################################
return dists
def predict_labels(self, dists, k=1):
"""
Given a matrix of distances between test points and training points,
predict a label for each test point.
Inputs:
- dists: A numpy array of shape (num_test, num_train) where dists[i, j]
gives the distance betwen the ith test point and the jth training point.
Returns:
- y: A numpy array of shape (num_test,) containing predicted labels for the
test data, where y[i] is the predicted label for the test point X[i].
"""
num_test = dists.shape[0]
y_pred = np.zeros(num_test)
for i in range(num_test):
# A list of length k storing the labels of the k nearest neighbors to
# the ith test point.
closest_y = []
#########################################################################
# TODO: #
# Use the distance matrix to find the k nearest neighbors of the ith #
# testing point, and use self.y_train to find the labels of these #
# neighbors. Store these labels in closest_y. #
# Hint: Look up the function numpy.argsort. #
#########################################################################
#对距离进行排序
idx = np.argsort(dists[i, :], axis=0)
#########################################################################
# TODO: #
# Now that you have found the labels of the k nearest neighbors, you #
# need to find the most common label in the list closest_y of labels. #
# Store this label in y_pred[i]. Break ties by choosing the smaller #
# label. #
#########################################################################
#选取前k个标签
closest_y = self.y_train[idx[: k]]
#求前k个标签中出现次数最多的一个
y_pred[i] = np.argmax(np.bincount(closest_y))
#########################################################################
# END OF YOUR CODE #
#########################################################################
return y_pred
作业涉及:SVM、softmax和神经网络,要用到线性SVM的损失、softmax的损失、神经网络的方向传播。
二层神经网络:
正向传播
方向传播
代码链接如下:https://download.youkuaiyun.com/download/qq_35494379/12329653