[CS231n Assignment 2 #01] 全连接神经网络(Fully-connected Neural Network)

本文详细介绍了CS231n课程的第二份作业,重点在于全连接神经网络的模块化设计和优化。作业涵盖Affine层的前向传播和反向传播、ReLU激活函数、多层网络结构、损失层以及优化器(如SGD、Momentum、RMSProp和Adam)。通过实现和训练网络,提升模型在数据集上的性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

作业介绍
  • 作业主页:Assignment 2
  • 作业目的:之前我们已经实现过一个双层的神经网络了,但是,它的所有函数被放置在一个文件中。对于,简单的神经网络,这种做法或许比较简便,但是当我们需要更大、更深的神经网络的时候,这种写法可能就不是那么高效。所以,本次作业,我们需要学会如何对神经网络进行 分层设计 以及 模块化,在不同的文件中实现我们不同的模块,然后将它们 整合 成最终的网络。
  • 官方示例代码: Assignment 2 code
  • 作业源文件 FullyConnectedNets.ipynb
1.Fully-Connected Neural Nets 架构

在本次作业中,我们将模块化的实现我们的全连接神经网络,每一层网络,我们将实现前向传播forward() 和反向传播 backward()

  • 其中,forward()接收输入和权重以及必要的其它参数,然后返回一个输出,以及存储我们在反向传播过程中需要的变量,即类似于:
def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output

  cache = (x, w, z, out) # Values we need to compute gradients

  return out, cache
  • 而反向传播过程backward()将接收上流梯度以及之前存储的变量,然后返回输入以及权重的梯度:
def layer_backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache

  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w

  return dx, dw
2. 初始化作业环境
  • 下载数据集
  • 安装必要的包
  • 注意:gnureadline==6.3.3 在windows下不支持,直接不安装就行;其它也并不都是必须的,可选择性安装,但是要有Numpy、Cython、Future等。

cd assignment2
pip install -r requirements.txt
  • 编译Cython扩展:因为卷积神经网络需要一些高效的操作,所以官方已经用Cython实现了必要的操作,例如im2col.py。我们要做的就是先编译这个文件,即:在cs231n目录下,运行setup.py
python setup.py build_ext --inplace
  • 初始化Jupyter notebook环境
# As usual, a bit of setup
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
  • 加载数据:
# Load the (preprocessed) CIFAR10 data.

data = get_CIFAR10_data()
for k, v in list(data.items()):
  print(('%s: ' % k, v.shape))
('X_val: ', (1000, 3, 32, 32))
('y_test: ', (1000,))
('y_train: ', (49000,))
('X_test: ', (1000, 3, 32, 32))
('X_train: ', (49000, 3, 32, 32))
('y_val: ', (1000,))
3. 实现全连接层(Affine Layer)
3.1 前向传播
  • Open the file cs231n/layers.py and implement the affine_forward function.
def affine_forward(x, w, b):
    """
    Computes the forward pass for an affine (fully-connected) layer.

    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.

    Inputs:
    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
    - w: A numpy array of weights, of shape (D, M)
    - b: A numpy array of biases, of shape (M,)

    Returns a tuple of:
    - out: output, of shape (N, M)
    - cache: (x, w, b)
    """
    out = None
    # reshape the input into rows.
    x = x.reshape(x.shape[0],-1) # [N , D]
    out = np.dot(x,w) + b # [N , M]
    cache = (x, w, b)
    return out, cache
3.2 反向传播
  • Now implement the affine_backward function and test your implementation using numeric gradient checking.
def affine_backward(dout, cache):
    """
    Computes the backward pass for an affine layer.

    Inputs:
    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
      - x: Input data, of shape (N, d_1, ... d_k)
      - w: Weights, of shape (D, M)
      - b: Biases, of shape (M,)

    Returns a tuple of:
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    """
    x, w, b = cache
    x_rows = x.reshape(x.shape[0],-1)
    d_xrows = np.dot(dout,w.T)
    dx = d_xrows.reshape(x.shape)
    dw = np.dot(x_rows.T, dout)
    # 注意,这里的db没有对行取平均
    db = np.sum(dout, axis=0)
    return dx, dw, db
4. ReLU激活函数
def relu_forward(x):
    """
    Computes the forward pass for a layer of rectified linear units (ReLUs).

    Input:
    - x: Inputs, of any shape

    Returns a tuple of:
    - out: Output, of the same shape as x
    - cache: x
    """
    out = np.maximum(0,x)
    cache = x
    return out, cache


def relu_backward(dout, cache):
    """
    Computes the backward pass for a layer of rectified linear units (ReLUs).

    Input:
    - dout: Upstream derivatives, of any shape
    - cache: Input x, of same shape as dout

    Returns:
    - dx: Gradient with respect to x
    """
    dx, x = None, cache
    dx = (x > 0) * dout
    return dx
5. “Sandwich” layers

我们经常会在FC层后面添加ReLU激活函数,请在cs231n/layer_utils.py.中实现该组合。也算是简单的模型设计组合练习。

def affine_relu_forward(x, w, b):
    """
    Convenience layer that perorms an affine transform followed by a ReLU

    Inputs:
    - x: Input to the affine layer
    - w, b: Weights for the affine layer

    Returns a tuple of:
    - out: Output from the ReLU
    - cache: Object to give to the backward pass
    """
    a, fc_cache = affine_forward(x, w, b)
    out, relu_cache = relu_forward(a)
    cache = (fc_cache, relu_cache)
    return out, cache


def affine_relu_backward(dout, cache):
    """
    Backward pass for the affine-relu convenience layer
    """
    fc_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    dx, dw, db = affine_backward(da, fc_cache)
    return dx, dw, db
6. 损失层(Loss Layer)
  • You implemented these loss functions in the last assignment, so we’ll give them to you for free here. You should still make sure you understand how they work by looking at the implementations in cs231n/layers.py.
  • 居然还有这种好事,不过大家可以和前面自己实现的比较一下。

def svm_loss(x, y):
    """
    Computes the loss and gradient using for multiclass SVM classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx


def softmax_loss(x, y):
    """
    Computes the loss and gradient for softmax classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    shifted_logits = x - np.max(x, axis=1, keepdims=True)
    Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)
    log_probs = shifted_logits - np.log(Z)
    probs = np.exp(log_probs)
    N = 
### 数据库事务索引的作用及其相互关系 #### 一、数据库事务的作用 事务数据库管理系统提供的一种机制,用于确保一组操作能够被看作是一个不可分割的整体。其核心目标在于维护数据的一致性可靠性。具体来说,事务具有ACID属性(原子性、一致性、隔离性、持久性)。其中,一致性是指事务执行前后都必须满足数据库的约束条件[^4]。 #### 二、数据库索引的作用 索引是一种特殊的数据结构,旨在加速数据库中的查询操作。通过创建索引,可以显著减少检索所需的时间成本。常见的索引类型包括但不限于B树/B+树索引、哈希索引以及聚集索引/辅助索引等。每种类型的索引都有各自的优劣之处。例如,哈希索引利用散列函数实现了快速查找的功能,但在范围查询方面存在明显不足[^5];而B+树则更适合于范围查询场景[^2]。 当涉及到多字段组合查询时,可以通过建立联合索引来优化性能。如果仅基于单一字段构建索引,在某些情况下可能仍需回表获取其他未覆盖字段的信息。然而,一旦将这些额外所需的字段也纳入到同一个复合索引定义当中,则可有效避免此类开销——这就是所谓的“索引下推”技术的应用实例[^3]。 #### 三、事务索引之间的联系 尽管事务主要用于保障数据修改过程的安全可靠,而索引侧重提升读写效率,但实际上两者之间存在着紧密关联: 1. **锁机制的影响** 在并发环境下实施更新类SQL语句期间,合理设计并运用索引可以帮助降低锁定资源的数量级,从而缓解潜在死锁风险的同时提高吞吐量。 2. **日志记录负担减轻** 高效的索引策略有助于缩短每次DML(Data Manipulation Language)命令实际影响行数规模,间接减少了重做(redo)/撤销(undo)日记条目生成总量,这对于维持长时间运行的大批量事务尤为重要。 3. **恢复过程中依赖索引重建速度** 如果发生崩溃或其他意外情况致使部分已完成但尚未提交的工作丢失,那么后续重启阶段重新应用必要的变更动作往往离不开先前已存在的各种形式索引支持下的高效定位能力。 ```sql -- 创建一个简单的 B+ 树索引示例 CREATE INDEX idx_name_sex ON user (name, sex); -- 使用该索引来防止回表操作 SELECT id, name, sex FROM user WHERE name='John' AND sex='Male'; ```
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值