CS231n+assignment2（一）

原创

已于 2022-07-31 19:48:52 修改 · 2.2k 阅读

16 ·

CC 4.0 BY-SA版权

文章标签：

#numpy #python #深度学习

于 2022-07-31 19:44:10 首次发布

本文档详述了如何使用Python实现一个多层全连接网络，包括前向传播、反向传播以及ReLU激活函数。在CS231n课程的第二个作业中，重点是理解模型初始化、损失计算和梯度检查。通过实验调整学习率和初始化规模，以在小数据集上实现过拟合。此外，探讨了SGD、动量法（Momentum）、RMSProp和Adam等优化算法，并比较了它们在训练速度和效果上的差异。最后，讨论了AdaGrad为何会导致更新逐渐减小及其与Adam的区别。

assignment2目录

CS231n+assignment2（一）

前言

第二个作业难度还是挺大的，主要实现任意深度的全连接网络。

一、环境搭建

作业二需要自己搭建环境，然后导入requirements.txt里面的各种包，这里不再赘述。

二、代码实现

Multi-Layer Fully Connected Network

在本练习中，您将实现一个具有任意数量隐藏层的完全连接的网络。
让我们补全cs231n/classifiers/fc_net.py里面的代码。
在这之前，需要补全cs231n/layers.py里面的前向、反向传播算法以及softmax_loss。在作业一中已经实现，这里不做过多说明。

首先时全连接的前向和反向传播：

def affine_forward(x, w, b):
    """Computes the forward pass for an affine (fully connected) layer.

    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.

    Inputs:
    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
    - w: A numpy array of weights, of shape (D, M)
    - b: A numpy array of biases, of shape (M,)

    Returns a tuple of:
    - out: output, of shape (N, M)
    - cache: (x, w, b)
    """
    out = None
    x_temp = x.reshape(x.shape[0], -1)
    out = x_temp.dot(w) + b
    cache = (x, w, b)
    return out, cache

def affine_backward(dout, cache):
    """Computes the backward pass for an affine (fully connected) layer.

    Inputs:
    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
      - x: Input data, of shape (N, d_1, ... d_k)
      - w: Weights, of shape (D, M)
      - b: Biases, of shape (M,)

    Returns a tuple of:
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    """
    x, w, b = cache
    dx, dw, db = None, None, None
    x_temp = np.reshape(x, (x.shape[0], -1))
    db = np.sum(dout, axis=0, keepdims=True)
    dw = np.dot(x_temp.T, dout)
    dx = np.dot(dout, w.T)
    dx = np.reshape(dx, x.shape)
    return dx, dw, db

激活函数:

def relu_forward(x):
    """Computes the forward pass for a layer of rectified linear units (ReLUs).

    Input:
    - x: Inputs, of any shape

    Returns a tuple of:
    - out: Output, of the same shape as x
    - cache: x
    """
    out = None
    out = np.maximum(0, x)
    cache = x
    return out, cache


def relu_backward(dout, cache):
    """Computes the backward pass for a layer of rectified linear units (ReLUs).

    Input:
    - dout: Upstream derivatives, of any shape
    - cache: Input x, of same shape as dout

    Returns:
    - dx: Gradient with respect to x
    """
    dx, x = None, cache
    dx = dout
    dx[x <= 0] = 0
    return dx

softmax_loss:

def softmax_loss(x, y):
    """Computes the loss and gradient for softmax classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    loss, dx = None, None
    num = len(x)
    x_scores = x[range(num), y]
    loss = np.sum(- np.log(np.exp(x_scores) / np.sum(np.exp(x), axis=1))) / num
    dx = np.zeros_like(x)
    dx = np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)
    dx[range(num), y] -= 1
    dx /= num
    return loss, dx

实现上述代码后，加载数据。
数据加载cifar-10

Initial Loss and Gradient Check

接着我们需要完成模型的初始化，打开fc_net.py文件。
首先需要实现模型参数的初始化
将w初始化为标准高斯分布的随机数，将b初始化为0。同时如果需要归一化则将scale初始化为1，shift初始化为0。

layers_dims = [input_dim] + hidden_dims + [num_classes]
for i in range(self.num_layers):
	self.params[f'W{
     
     i+1}'] = np.random.randn(layers_dims[i], layers_dims[i+1]) * weight_scale
    self.params[f'b{
     
     i+1}'] = np.zeros(shape=(1, layers_dims[i+1]))
    if normalization == 'batchnorm' and i < len(hidden_dims):
    	self.params[f'gamma{
     
     i}'] = np.ones((1, layers_dims[i + 1]))
        self.params[f'beta{
     
     i}'] = np.zeros((1, layers_dims[i + 1]))

接着将loss的代码补全：
这里需要注意的是，在计算输出时，不需要激活。normlization代码可暂时忽略。

h, cache1, cache2, cache3, cache4, bn, out = {
   
   }, {
   
   }, {
   
   }, {
   
   }, {
   
   }, {
   
   }, {
   
   }
out[0] = X
for i in range(self.num_layers - 1):
    w, b = self.params[f'W{
     
     i+1}'], self.params[f'b{
     
     i+1}']
    if self.normalization != None:
        gamma, beta = self.params[f'gamma{
     
     i + 1}'], self.params[f'beta{
     
     i + 1}']
        h[i], cache1[i] = affine_forward(out[i], w, b)
        if self.normalization == 'batchnorm':
            bn[i], cache2

最低0.47元/天解锁文章