CS231n+assignment2(一)

本文档详述了如何使用Python实现一个多层全连接网络,包括前向传播、反向传播以及ReLU激活函数。在CS231n课程的第二个作业中,重点是理解模型初始化、损失计算和梯度检查。通过实验调整学习率和初始化规模,以在小数据集上实现过拟合。此外,探讨了SGD、动量法(Momentum)、RMSProp和Adam等优化算法,并比较了它们在训练速度和效果上的差异。最后,讨论了AdaGrad为何会导致更新逐渐减小及其与Adam的区别。

assignment2目录

CS231n+assignment2(一)



前言

第二个作业难度还是挺大的,主要实现任意深度的全连接网络。


一、环境搭建

作业二需要自己搭建环境,然后导入requirements.txt里面的各种包,这里不再赘述。

二、代码实现

Multi-Layer Fully Connected Network

在本练习中,您将实现一个具有任意数量隐藏层的完全连接的网络。
让我们补全cs231n/classifiers/fc_net.py里面的代码。
在这之前,需要补全cs231n/layers.py里面的前向、反向传播算法以及softmax_loss。在作业一中已经实现,这里不做过多说明。

首先时全连接的前向和反向传播:

def affine_forward(x, w, b):
    """Computes the forward pass for an affine (fully connected) layer.

    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.

    Inputs:
    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
    - w: A numpy array of weights, of shape (D, M)
    - b: A numpy array of biases, of shape (M,)

    Returns a tuple of:
    - out: output, of shape (N, M)
    - cache: (x, w, b)
    """
    out = None
    x_temp = x.reshape(x.shape[0], -1)
    out = x_temp.dot(w) + b
    cache = (x, w, b)
    return out, cache
def affine_backward(dout, cache):
    """Computes the backward pass for an affine (fully connected) layer.

    Inputs:
    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
      - x: Input data, of shape (N, d_1, ... d_k)
      - w: Weights, of shape (D, M)
      - b: Biases, of shape (M,)

    Returns a tuple of:
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    """
    x, w, b = cache
    dx, dw, db = None, None, None
    x_temp = np.reshape(x, (x.shape[0], -1))
    db = np.sum(dout, axis=0, keepdims=True)
    dw = np.dot(x_temp.T, dout)
    dx = np.dot(dout, w.T)
    dx = np.reshape(dx, x.shape)
    return dx, dw, db

激活函数:

def relu_forward(x):
    """Computes the forward pass for a layer of rectified linear units (ReLUs).

    Input:
    - x: Inputs, of any shape

    Returns a tuple of:
    - out: Output, of the same shape as x
    - cache: x
    """
    out = None
    out = np.maximum(0, x)
    cache = x
    return out, cache


def relu_backward(dout, cache):
    """Computes the backward pass for a layer of rectified linear units (ReLUs).

    Input:
    - dout: Upstream derivatives, of any shape
    - cache: Input x, of same shape as dout

    Returns:
    - dx: Gradient with respect to x
    """
    dx, x = None, cache
    dx = dout
    dx[x <= 0] = 0
    return dx

softmax_loss:

def softmax_loss(x, y):
    """Computes the loss and gradient for softmax classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    loss, dx = None, None
    num = len(x)
    x_scores = x[range(num), y]
    loss = np.sum(- np.log(np.exp(x_scores) / np.sum(np.exp(x), axis=1))) / num
    dx = np.zeros_like(x)
    dx = np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)
    dx[range(num), y] -= 1
    dx /= num
    return loss, dx

实现上述代码后,加载数据。
数据加载cifar-10

Initial Loss and Gradient Check

接着我们需要完成模型的初始化,打开fc_net.py文件。
首先需要实现模型参数的初始化
将w初始化为标准高斯分布的随机数,将b初始化为0。同时如果需要归一化则将scale初始化为1,shift初始化为0。

layers_dims = [input_dim] + hidden_dims + [num_classes]
for i in range(self.num_layers):
	self.params[f'W{
     
     i+1}'] = np.random.randn(layers_dims[i], layers_dims[i+1]) * weight_scale
    self.params[f'b{
     
     i+1}'] = np.zeros(shape=(1, layers_dims[i+1]))
    if normalization == 'batchnorm' and i < len(hidden_dims):
    	self.params[f'gamma{
     
     i}'] = np.ones((1, layers_dims[i + 1]))
        self.params[f'beta{
     
     i}'] = np.zeros((1, layers_dims[i + 1]))

接着将loss的代码补全:
这里需要注意的是,在计算输出时,不需要激活。normlization代码可暂时忽略。

h, cache1, cache2, cache3, cache4, bn, out = {
   
   }, {
   
   }, {
   
   }, {
   
   }, {
   
   }, {
   
   }, {
   
   }
out[0] = X
for i in range(self.num_layers - 1):
    w, b = self.params[f'W{
     
     i+1}'], self.params[f'b{
     
     i+1}']
    if self.normalization != None:
        gamma, beta = self.params[f'gamma{
     
     i + 1}'], self.params[f'beta{
     
     i + 1}']
        h[i], cache1[i] = affine_forward(out[i], w, b)
        if self.normalization == 'batchnorm':
            bn[i], cache2
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值