assignment2目录
文章目录
前言
第二个作业难度还是挺大的,主要实现任意深度的全连接网络。
一、环境搭建
作业二需要自己搭建环境,然后导入requirements.txt里面的各种包,这里不再赘述。
二、代码实现
Multi-Layer Fully Connected Network
在本练习中,您将实现一个具有任意数量隐藏层的完全连接的网络。
让我们补全cs231n/classifiers/fc_net.py里面的代码。
在这之前,需要补全cs231n/layers.py里面的前向、反向传播算法以及softmax_loss。在作业一中已经实现,这里不做过多说明。
首先时全连接的前向和反向传播:
def affine_forward(x, w, b):
"""Computes the forward pass for an affine (fully connected) layer.
The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
examples, where each example x[i] has shape (d_1, ..., d_k). We will
reshape each input into a vector of dimension D = d_1 * ... * d_k, and
then transform it to an output vector of dimension M.
Inputs:
- x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
- w: A numpy array of weights, of shape (D, M)
- b: A numpy array of biases, of shape (M,)
Returns a tuple of:
- out: output, of shape (N, M)
- cache: (x, w, b)
"""
out = None
x_temp = x.reshape(x.shape[0], -1)
out = x_temp.dot(w) + b
cache = (x, w, b)
return out, cache
def affine_backward(dout, cache):
"""Computes the backward pass for an affine (fully connected) layer.
Inputs:
- dout: Upstream derivative, of shape (N, M)
- cache: Tuple of:
- x: Input data, of shape (N, d_1, ... d_k)
- w: Weights, of shape (D, M)
- b: Biases, of shape (M,)
Returns a tuple of:
- dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
- dw: Gradient with respect to w, of shape (D, M)
- db: Gradient with respect to b, of shape (M,)
"""
x, w, b = cache
dx, dw, db = None, None, None
x_temp = np.reshape(x, (x.shape[0], -1))
db = np.sum(dout, axis=0, keepdims=True)
dw = np.dot(x_temp.T, dout)
dx = np.dot(dout, w.T)
dx = np.reshape(dx, x.shape)
return dx, dw, db
激活函数:
def relu_forward(x):
"""Computes the forward pass for a layer of rectified linear units (ReLUs).
Input:
- x: Inputs, of any shape
Returns a tuple of:
- out: Output, of the same shape as x
- cache: x
"""
out = None
out = np.maximum(0, x)
cache = x
return out, cache
def relu_backward(dout, cache):
"""Computes the backward pass for a layer of rectified linear units (ReLUs).
Input:
- dout: Upstream derivatives, of any shape
- cache: Input x, of same shape as dout
Returns:
- dx: Gradient with respect to x
"""
dx, x = None, cache
dx = dout
dx[x <= 0] = 0
return dx
softmax_loss:
def softmax_loss(x, y):
"""Computes the loss and gradient for softmax classification.
Inputs:
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth
class for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C
Returns a tuple of:
- loss: Scalar giving the loss
- dx: Gradient of the loss with respect to x
"""
loss, dx = None, None
num = len(x)
x_scores = x[range(num), y]
loss = np.sum(- np.log(np.exp(x_scores) / np.sum(np.exp(x), axis=1))) / num
dx = np.zeros_like(x)
dx = np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)
dx[range(num), y] -= 1
dx /= num
return loss, dx
实现上述代码后,加载数据。

Initial Loss and Gradient Check
接着我们需要完成模型的初始化,打开fc_net.py文件。
首先需要实现模型参数的初始化
将w初始化为标准高斯分布的随机数,将b初始化为0。同时如果需要归一化则将scale初始化为1,shift初始化为0。
layers_dims = [input_dim] + hidden_dims + [num_classes]
for i in range(self.num_layers):
self.params[f'W{
i+1}'] = np.random.randn(layers_dims[i], layers_dims[i+1]) * weight_scale
self.params[f'b{
i+1}'] = np.zeros(shape=(1, layers_dims[i+1]))
if normalization == 'batchnorm' and i < len(hidden_dims):
self.params[f'gamma{
i}'] = np.ones((1, layers_dims[i + 1]))
self.params[f'beta{
i}'] = np.zeros((1, layers_dims[i + 1]))
接着将loss的代码补全:
这里需要注意的是,在计算输出时,不需要激活。normlization代码可暂时忽略。
h, cache1, cache2, cache3, cache4, bn, out = {
}, {
}, {
}, {
}, {
}, {
}, {
}
out[0] = X
for i in range(self.num_layers - 1):
w, b = self.params[f'W{
i+1}'], self.params[f'b{
i+1}']
if self.normalization != None:
gamma, beta = self.params[f'gamma{
i + 1}'], self.params[f'beta{
i + 1}']
h[i], cache1[i] = affine_forward(out[i], w, b)
if self.normalization == 'batchnorm':
bn[i], cache2

本文档详述了如何使用Python实现一个多层全连接网络,包括前向传播、反向传播以及ReLU激活函数。在CS231n课程的第二个作业中,重点是理解模型初始化、损失计算和梯度检查。通过实验调整学习率和初始化规模,以在小数据集上实现过拟合。此外,探讨了SGD、动量法(Momentum)、RMSProp和Adam等优化算法,并比较了它们在训练速度和效果上的差异。最后,讨论了AdaGrad为何会导致更新逐渐减小及其与Adam的区别。
最低0.47元/天 解锁文章
319





