cs231n assignment1_Q4_two_layer_net

最新推荐文章于 2023-07-25 20:21:33 发布

进击的吃恩程sy

最新推荐文章于 2023-07-25 20:21:33 发布

阅读量651

点赞数 1

本文链接：https://blog.youkuaiyun.com/Sean_csy/article/details/89162398

版权

这篇博客详细介绍了两层全连接神经网络的结构，包括输入层、ReLU激活函数、softmax输出层。文章重点讨论了在梯度计算中的分类求导，并引用了反向传播算法来求解网络权重的梯度。作者通过实例解释了如何进行前向传播和反向传播，并提到了超参数调优。文章以一个有趣的比喻总结了反向传播中不同操作对梯度的影响。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

A two-layer fully-connected neural network. The net has an input dimension of
N, a hidden layer dimension of H, and performs classification over C classes.
We train the network with a softmax loss function and L2 regularization on the
weight matrices. The network uses a ReLU nonlinearity after the first fully
connected layer.In other words, the network has the following architecture:
input - fully connected layer - ReLU - fully connected layer - softmax
The outputs of the second fully-connected layer are the scores for each class.

本次两层网络的作业难点还是在梯度的计算上，题目要求的两个激活函数分别是ReLu函数和softmax函数。来回顾一下。

在这里插入图片描述

对其求导

ReLu

softmax

其中，
在这里插入图片描述
运用链式法则,

这里求导要进行分类，当j!=yi 时：
在这里插入图片描述
当j==yi时：

在网络中，我们用反向传播算法来求梯度。以下公式来源（https://blog.youkuaiyun.com/yc461515457/article/details/51944683）

前向传播：

在这里插入图片描述

反向传播

在这里插入图片描述

在明确方法后，开始编写程序。

from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt
from past.builtins import xrange

class TwoLayerNet(object):
  """
  A two-layer fully-connected neural network. The net has an input dimension of
  N, a hidden layer dimension of H, and performs classification over C classes.
  We train the network with a softmax loss function and L2 regularization on the
  weight matrices. The network uses a ReLU nonlinearity after the first fully
  connected layer.

  In other words, the network has the following architecture:

  input - fully connected layer - ReLU - fully connected layer - softmax

  The outputs of the second fully-connected layer are the scores for each class.
  """

  def __init__(self, input_size, hidden_size, output_size, std=1e-4):
    """
    Initialize the model. Weights are initialized to small random values and
    biases are initialized to zero. Weights and biases are stored in the
    variable self.params, which is a dictionary with the following keys:

    W1: First layer weights; has shape (D, H)
    b1: First layer biases; has shape (H,)
    W2: Second layer weights; has shape (H, C)
    b2: Second layer biases; has shape (C,)

    Inputs:
    - input_size: The dimension D of the input data.
    - hidden_size: The number of neurons H in the hidden layer.
    - output_size: The number of classes C.
    """
    self.params = {
   }
    self.params['W1'] = std * np.random.randn(input_size, hidden_size)
    self.params['b1'] = np.zeros(hidden_size)
    self.params['W2'] = std * np.random.randn(hidden_size, output_size)
    self.params['b2'] = np.zeros(output_size)

  def loss(self, X, y=None, reg=0.0):
    """
    输入层（D），全连接层-ReLu(H)，softmax(C)
    Compute the loss and gradients for a two layer fully connected neural
    network.

    Inputs:
    - X: Input data of shape (N, D). Each X[i] is a training sample.
    - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
      an integer in the range 0 <= y[i] < C. This parameter is optional; if it
      is not passed then we only return scores, and if it is passed then we
      instead return the loss and gradients.
    - reg: Regularization strength.

    Returns:
    If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
    the score for class c on input X[i].

    If y is not None, instead return a tuple of:
    - loss: Loss (data loss and regularization loss) for this batch of training
      samples.
    - grads: Dictionary mapping parameter names to gradients of those parameters
      with respect to the loss function; has the same keys as self.params.
    """
    # Unpack variables from the params dictionary
    W1, b1 =

最低0.47元/天解锁文章