反向传播算法详解-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_43887432/article/details/107919761

在这里插入图片描述

反向传播算法

反向传播是神经网络中的一个术语，利用反向传播可以最小化代价函数。类似在线性回归和逻辑回归中，利用梯度下降所做的工作，在神经网络中，我们的目的也是计算 $min_\theta J(\theta)$ ,也就是想要找到一组最优的参数 $\theta$ 来最小化我们的代价函数。
反向传播的算法如下：

训练集为 ${(x^{(1)},y^{(1)})},...(x^{(m)},y^{(m)})$
Set $\Delta_{ij}^{(l)} = 0$ (for all i, j, l)
For i = 1 to m
（缩进） Set $a^{(1)}=x^{(i)}$
（缩进）Perform forward propagation to compute $a^{(l)}$ for $l = 2, 3, 4, . . . L$
（缩进）Using $y^{(i)}$ , compute $\delta^{(L)}=a^{(L)}-y^{(i)}$
（缩进）Compute $\delta^{(L-1)},\delta^{(L-2)},...\delta^{(2)}$
（缩进） $\Delta^{(l)}_{ij}:=\Delta^{(l)}_{ij}+a^{(l)}_j\delta^{(l+1)}_i$
$D_{ij}^{(l)}:=\frac{1}{m}\Delta_{ij}^{(l)}+\lambda\theta_{ij}^{(l)} if j \neq 0$
$D_{ij}^{(l)}:=\frac{1}{m}\Delta_{ij}^{(l)} if j = 0$
//D就是J对 $\theta$ 的导数

如何计算 $\delta$

$\delta^{(L)}=a^{(L)}-y^{(t)}$ (L表示神经网络的层数。神经网络中最后一层的误差就是预测值和实际值之间的误差）
之后按照 $L - 1, L - 2, L - 3, . . . 2$ 的顺序计算 $\delta$
$\delta^{(l)}=((\theta^{(l)})^{T} \delta^{(l+1)}.*a^{(l)}.*(1-a^{(l)})$

梯度检测

为了检验反向传播算法是否正确，可以借助梯度检测进行验证。编写反向传播时将其结果和梯度检测得到的结果进行比较，来检验反向传播编写是否正确。但是实际训练网络时，需要将梯度检测关闭，因为梯度检测的代价很大。
$\frac{\partial}{\partial\theta_j}J(\theta)\approx\frac{J(\theta+\epsilon)-J(\theta-\epsilon)}{2\epsilon}$
当 $\theta$ 是向量的时候，得到:
$\frac{\partial}{\partial\theta_j}J(\theta)\approx\frac{J(\theta_1,...,\theta_j+\epsilon,...,\theta_n)-J(\theta_1,...,\theta_j-\epsilon,...,\theta_n)}{2\epsilon}$
这里的 $\epsilon$ 选取要小(例如 $\epsilon=10^{-4}$ )，这样得到的结果才接近真实的导数值。
当反向传播得到的结果和梯度检测的结果是近似刻意看做相等的，则有理由相信反向传播算法是正确的。

随机初始化

在神经网络中，将 $\theta$ 初始化为0或者其他的相同的数字，是不会起作用的。所以要进行随机初始化。一种方法如下：
在这里插入图片描述
这样可以使 $\theta_{ij}^{(l)}$ 均介于 $[-\epsilon,\epsilon]$ 之间。

编程作业

checkNNGradients.py

from debugInitializeWeights import *
from nnCostFunction import *
from computeNumericalGradient import *
from numpy import linalg as la
from numpy import max

def checkNNGradients(my_lambda):
# CHECKNNGRADIENTS Creates a small neural network to check the
# backpropagation gradients
# CHECKNNGRADIENTS(lambda ) Creates a small neural network to check the
# backpropagation gradients, it will output the analytical gradients
# produced by your backprop code and the numerical gradients(computed
# using computeNumericalGradient).These two gradient computations should
# result in very similar values

    input_layer_size = 3
    hidden_layer_size = 25
    num_labels = 3
    m = 6

    # We generate some 'random' test data
    Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size) ##(25,3)
    Theta2 = debugInitializeWeights(num_labels, hidden_layer_size) ##(3,26)
    # Reusing debugInitializeWeights to generate X
    X  = debugInitializeWeights(m, input_layer_size - 1)##(6,3)
    y  = 1 + np.mod(np.arange(1, m + 1), num_labels)
    # Unroll parameters
    nn_params = np.append(Theta1, Theta2)

    # Short hand for cost function


    def cost_func(p):
        return nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda)
    cost, grad = cost_func(nn_params)
    numgrad = computeNumericalGradient(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda)

    # Visually examine the two gradient computations.  The two columns
    # you get should be very similar.
    ans = np.c_[numgrad, grad]
    print(ans)
    print('The above two columns you get should be very similar.' 
         '(Left-Your Numerical Gradient, Right-Analytical Gradient)')
    '''
    # Evaluate the norm of the difference between two solutions.
    # If you have a correct implementation, and assuming you used EPSILON = 0.0001
    # in computeNumericalGradient.m, then diff below should be less than 1e-9
    numgrad = numgrad.reshape((numgrad.size, -1))
    grad = grad.reshape((grad.size, -1))
    diff = la.svd(numgrad-grad)/la.svd(numgrad+grad)
    ##返回矩阵的一个范数，具体来说是矩阵最大奇异值
    print('If your backpropagation implementation is correct, then ' 
         'the relative difference will be small (less than 1e-9). '
         'Relative Difference: {}'.format(diff))
    '''

computeNumericalGradient.py

import numpy as np
from nnCostFunction import nnCostFunction

def computeNumericalGradient(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda):
# COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences"
# and gives us a numerical estimate of the gradient.
    e = 0.0001
    numgrad = np.zeros(nn_params.size)
    print(nn_params.size)
    print('计算中的数字{}'.format(numgrad.shape))
    for i in range(nn_params.size):
        perturb = np.zeros(nn_params.size)
        perturb[i] = e
        loss1, grad1 = nnCostFunction(nn_params - perturb, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda)
        loss2, grad2 = nnCostFunction(nn_params + perturb, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda)
        #Compute Numerical Gradient
        numgrad[i] = (loss2 - loss1) / (2 * e)
        perturb[i] = 0
    return numgrad

debugInitializeWeights.py

import numpy as np
def debugInitializeWeights(fan_out, fan_in):
#DEBUGINITIALIZEWEIGHTS Initialize the weights of a layer with fan_in
#incoming connections and fan_out outgoing connections using a fixed
#strategy, this will help you later in debugging
#   W = DEBUGINITIALIZEWEIGHTS(fan_in, fan_out) initializes the weights
#   of a layer with fan_in incoming connections and fan_out outgoing
#   connections using a fix set of values
#
#   Note that W should be set to a matrix of size(1 + fan_in, fan_out) as
#   the first row of W handles the "bias" terms
#

    # Set W to zeros
    W = np.zeros(fan_out * (fan_in + 1))

    # Initialize W using "sin", this ensures that W is always of the same
    # values and will be useful for debugging
    for i in range(W.size):
        W[i] = np.sin(i) / 10
    W = W.reshape(fan_out, fan_in + 1)
    return W

displayData.py

import numpy as np
import matplotlib.pyplot as plt

####不知道为啥这样写，这是一篇博客里面的，看了看和题目给的matlab代码近似，
####只是将matlab代码转换成python代码

####看了代码大致明白是怎么实现的了
####将数据向量还原成20*20的矩阵块，之后将其作为图像进行显示即可
def display_data(x):
    (m, n) = x.shape  # 100*400

    example_width = np.round(np.sqrt(n)).astype(int)  # 每个样本显示宽度 round()四舍五入到个位 并转换为int
    example_height = np.floor(n / example_width).astype(int)  # 每个样本显示高度  并转换为int

    # 设置显示格式 100个样本 分10行 10列显示
    display_rows = np.floor(np.sqrt(m)).astype(int)
    display_cols = np.ceil(m / display_rows).astype(int)

    # 待显示的每张图片之间的间隔
    pad = 1

    # 显示的布局矩阵 初始化值为-1
    display_array = - np.ones((pad + display_rows * (example_height + pad),
                               pad + display_cols * (example_width + pad)))

    # Copy each example into a patch on the display array
    curr_ex = 0
    for j in range(display_rows):
        for i in range(display_cols):
            if curr_ex > m:##表示样本数量,显示的样本数量不能多于m
                break

            # Copy the patch
            # Get the max value of the patch
            max_val = np.max(np.abs(x[curr_ex]))
            display_array[pad + j * (example_height + pad) + np.arange(example_height),
                          pad + i * (example_width + pad) + np.arange(example_width)[:, np.newaxis]] = \
                x[curr_ex].reshape((example_height, example_width)) / max_val ##这个max_val不知道有什么用，去掉之后图像看起来没什么变化
            curr_ex += 1

        if curr_ex > m:
            break

    # 显示图片
    plt.figure()
    plt.imshow(display_array, cmap='gray', extent=[-1, 1, -1, 1])
    plt.axis('off')
    plt.show()

ex4.py

import scipy.io as scio
import numpy as np
from displayData import display_data
from nnCostFunction import *
from randInitializeWeights import *
from checkNNGradients import *
## Machine Learning Online Class - Exercise 4 Neural Network Learning

## =========== Part 1: Loading and Visualizing Data =============
#  We start the exercise by first loading and visualizing the dataset.
#  You will be working with a dataset that contains handwritten digits.
#

# Load Training Data
print('Loading and Visualizing Data ...')

data = scio.loadmat('D:\课程相关\吴恩达机器学习\Andrew-NG-Meachine-Learning-master\Andrew-NG-Meachine-Learning'
                    '-master\machine-learning-ex4\machine-learning-ex4\ex4\ex4data1.mat')
X = data['X']
y = data['y'].flatten()
m = y.size

# Randomly select 100 data points to display
rand_indics = np.random.permutation(range(m))
sel = X[rand_indics[0:100], :]
display_data(sel)
input('Program paused. Press enter to continue.')


## ================ Part 2: Loading Parameters ================
# In this part of the exercise, we load some pre-initialized
# neural network parameters.

print('Loading Saved Neural Network Parameters ...')

# Load the weights into variables Theta1 and Theta2
Theta = scio.loadmat('D:\课程相关\吴恩达机器学习\Andrew-NG-Meachine-Learning-master\Andrew-NG-Meachine-Learnin'
                     'g-master\machine-learning-ex4\machine-learning-ex4\ex4\ex4weights.mat')
Theta1 = Theta['Theta1']
Theta2 = Theta['Theta2']
# Unroll parameters
nn_params = np.append(Theta1, Theta2)

## ================ Part 3: Compute Cost (Feedforward) ================
#  To the neural network, you should first start by implementing the
#  feedforward part of the neural network that returns the cost only. You
#  should complete the code in nnCostFunction.m to return cost. After
#  implementing the feedforward to compute the cost, you can verify that
#  your implementation is correct by verifying that you get the same cost
#  as us for the fixed debugging parameters.
#
#  We suggest implementing the feedforward cost *without* regularization
#  first so that it will be easier for you to debug. Later, in part 4, you
#  will get to implement the regularized cost.
#


#这部分检测没有正则化时计算的代价函数是否正确
print('Feedforward Using Neural Network ...')
# Weight regularization parameter (we set this to 0 here).
my_lambda = 0
## Setup the parameters you will use for this exercise
input_layer_size  = 400   # 20x20 Input Images of Digits
hidden_layer_size = 25    # 25 hidden units
num_labels = 10           # 10 labels, from 1 to 10
                          # (note that we have mapped "0" to label 10)
J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda)[0]

print('Cost at parameters (loaded from ex4weights): {} (this value should be about 0.287629)'.format(J))

input('Program paused. Press enter to continue.')



## =============== Part 4: Implement Regularization ===============
#  Once your cost function implementation is correct, you should now
#  continue to implement the regularization with the cost.
#

print('Checking Cost Function (w/ Regularization) ... ')

# Weight regularization parameter (we set this to 1 here).
my_lambda = 1

J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda)[0]

print('Cost at parameters (loaded from ex4weights): {} (this value should be about 0.383770)'.format(J))

input('Program paused. Press enter to continue.')


## ================ Part 5: Sigmoid Gradient  ================
#  Before you start implementing the neural network, you will first
#  implement the gradient for the sigmoid function. You should complete the
#  code in the sigmoidGradient.m file.
#

print('Evaluating sigmoid gradient...')

g = sigmoidGradient(np.array([-1, -0.5, 0, 0.5, 1]))
print('Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:{}'.format(g))


input('Program paused. Press enter to continue.')



## ================ Part 6: Initializing Pameters ================
#  In this part of the exercise, you will be starting to implment a two
#  layer neural network that classifies digits. You will start by
#  implementing a function to initialize the weights of the neural network
#  (randInitializeWeights.m)

print('Initializing Neural Network Parameters ...')

initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size)
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels)
# Unroll parameters
initial_nn_params = np.append(initial_Theta1, initial_Theta2)

'''
## =============== Part 7: Implement Backpropagation ===============
#  Once your cost matches up with ours, you should proceed to implement the
#  backpropagation algorithm for the neural network. You should add to the
#  code you've written in nnCostFunction.m to return the partial
#  derivatives of the parameters.
#
print('Checking Backpropagation... ')

#  Check gradients by running checkNNGradients
input('Program paused. Press enter to continue.')

## =============== Part 8: Implement Regularization ===============
#  Once your backpropagation implementation is correct, you should now
#  continue to implement the regularization with the cost and gradient.
#

print('Checking Backpropagation (w/ Regularization) ... ')

#  Check gradients by running checkNNGradients
my_lambda = 3
checkNNGradients(my_lambda)
'''

my_lambda = 3

# Also output the costFunction debugging values
debug_J = nnCostFunction(nn_params, input_layer_size,
                          hidden_layer_size, num_labels, X, y, my_lambda)[0]

print('Cost at (fixed) debugging parameters (w/ lambda = {}):'.format(my_lambda))
print('{}(for lambda = 3, this value should be about 0.576051)'.format(debug_J))

input('Program paused. Press enter to continue.')

## =================== Part 8: Training NN ===================
#  You have now implemented all the code necessary to train a neural
#  network. To train your neural network, we will now use "fmincg", which
#  is a function which works similarly to "fminunc". Recall that these
#  advanced optimizers are able to train our cost functions efficiently as
#  long as we provide them with the gradient computations.
#
print('Training Neural Network... ')

#  After you have completed the assignment, change the MaxIter to a larger
#  value to see how more training helps.


#  You should also try different values of lambda
mld = 1
# Create "short hand" for the cost function to be minimized
def costFunction(p):
    return nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels, X, y, mld)[0]
def gradFunction(p):
    return nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels, X, y, mld)[1]
# Now, costFunction is a function that takes in only one argument (the neural network parameters)
from scipy.optimize import fmin_cg
#nn_params, cost, *unused = fmin_cg(f=costFunction, fprime=gradFunction,
  #                                   x0=initial_nn_params, full_output=True, disp=False,maxiter=50)
'''
input_layer_size  = 400   # 20x20 Input Images of Digits
hidden_layer_size = 25    # 25 hidden units
num_labels = 10           # 10 labels, from 1 to 10
'''

#nn_params = fmin_cg(f=costFunction, fprime=gradFunction, x0=initial_nn_params, maxiter=50, disp=True, full_output=True)

nn_params = fmin_cg(costFunction, fprime=gradFunction, x0=initial_nn_params, maxiter=50, disp=True)
# Obtain Theta1 and Theta2 back from nn_params
Theta1 = nn_params[:(input_layer_size + 1) * hidden_layer_size].reshape((hidden_layer_size, (input_layer_size + 1)))

Theta2 = nn_params[(input_layer_size + 1) * hidden_layer_size:].reshape((num_labels, (hidden_layer_size + 1)))

input('Program paused. Press enter to continue.')


## ================= Part 9: Visualize Weights =================
#  You can now "visualize" what the neural network is learning by
#  displaying the hidden units to see what features they are capturing in
#  the data.

print('Visualizing Neural Network... ')

display_data(Theta1[:, 1:])

input('Program paused. Press enter to continue.')

## ================= Part 10: Implement Predict =================
#  After training the neural network, we would like to use it to predict
#  the labels. You will now implement the "predict" function to use the
#  neural network to predict the labels of the training set. This lets
#  you compute the training set accuracy.

from predict import *
pred = predict(Theta1, Theta2, X)

print('Training Set Accuracy: {}'.format(np.mean(pred == y) * 100))

nnCostFunction.py

import numpy as np

def h(theta, X):
    X = np.c_[np.ones(X.shape[0]), X]
    z = np.dot(X, theta.T)
    return 1 / (1 + np.exp(-z))

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoidGradient(z):
    g = sigmoid(z)
    return g * (1 - g)
##y的标签是10,9,8,7,...1
def nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, my_lambda):
    # =========your code here=======
    #instructions: you should complete the code by working through the following parts
    #Part1:feedforward the neural network ans return the cost in the variable J.
    Theta1 = nn_params[:(input_layer_size + 1) * hidden_layer_size].reshape((hidden_layer_size, (input_layer_size + 1)))
    Theta2 = nn_params[(input_layer_size + 1) * hidden_layer_size:].reshape((num_labels, (hidden_layer_size + 1)))
    # Setup some useful variables
    m = y.size
    a2 = h(Theta1, X)
    a3 = h(Theta2, a2)  # 这个就是输出的预测结果
    ##将标签变成矩阵形式
    Y = np.zeros((m, num_labels))
    for i in range(m):
        Y[i, y[i] - 1] = 1

    ##没有正则化的代价函数
    #J = -1 / m * np.sum(np.multiply(Y, np.log(a3)) + np.multiply(1 - Y, np.log(1 - a3)))

    ##theta中和X新加的1那列相乘的部分不用正则化
    reg_theta1 = Theta1[:, 1:]
    reg_theta2 = Theta2[:, 1:]
    ##正则化之后的代价函数，取my_lambda是0，就是没有正则化的代价函数
    ##这个代价函数和计算grad是没有直接关系的
    J = -1 / m * np.sum(np.multiply(Y, np.log(a3)) + np.multiply(1 - Y, np.log(1 - a3))) + my_lambda / (2 * m) * (np.sum(
        np.multiply(reg_theta1, reg_theta1)) + np.sum(np.multiply(reg_theta2, reg_theta2)))
    #cost function
    #Part2: Implement the backpropagation algorithm to compute the gradients
    #Theta1_grad and Theta2_grad.You should return the partial derivatives of
    #the cost function with respect to Theta1 and Theta2 in Theta1_grad and
    #Theta2_grad, respectively.After implementing Part 2, you can check
    #that your implementation is correct by running checkNNGradients

#Note: The vector y passed into the function is a vector of labels
#               containing values from 1..K. You need to map this vector into a
#               binary vector of 1's and 0's to be used with the neural network
#               cost function.
#
#         Hint: We recommend implementing backpropagation using a for-loop
#               over the training examples if you are implementing it for the
#               first time.

    #a2和a3已经计算出来了
    a2 = np.c_[np.ones(a2.shape[0]), a2]
    delta3 = a3 - Y ##5000*10
    delta2 = np.dot(delta3, Theta2) * a2 * (1 - a2) ##5000*26
    ##去掉第一列
    delta2 = delta2[:, 1:] ##5000*25
    X = np.c_[np.ones(X.shape[0]), X]
    Delta1 = np.dot(delta2.T, X) ##25, 401
    Delta2 = np.dot(delta3.T, a2) ##10, 26
    reg_Delta1 = my_lambda / m * np.c_[np.zeros(hidden_layer_size), reg_theta1]
    reg_Delta2 = my_lambda / m * np.c_[np.zeros(num_labels), reg_theta2]
    grad1 = (1 / m) * Delta1 + reg_Delta1
    grad2 = (1 / m) * Delta2 + reg_Delta2
    grad = np.append(grad1, grad2)
    return J, grad

predict.py

import numpy as np
from nnCostFunction import h

##输出标签
def predict(Theta1, Theta2, X):
    a2 = h(Theta1, X)
    a3 = h(Theta2, a2)
    ans = np.argmax(a3, axis=1)  # 找出每一行最大概率所在的位置
    ans += 1
    return ans

randInitializeWeights.py

'''
import numpy as np
#RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
#incoming connections and L_out outgoing connections
#   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights
#   of a layer with L_in incoming connections and L_out outgoing
#   connections.
#
#   Note that W should be set to a matrix of size(L_out, 1 + L_in) as
#   the first column of W handles the "bias" terms
#




# ====================== YOUR CODE HERE ======================
# Instructions: Initialize W randomly so that we break the symmetry while
#               training the neural network.
#
# Note: The first column of W corresponds to the parameters for the bias unit
#
def randInitializeWeights(L_in, L_out):
    # You need to return the following variables correctly
    W = np.zeros((L_out, 1 + L_in))
    theta_init = 0.12
    ###这个取值是根据输入和输出层单元个数选取的，一个合适的选择是取sqrt(6)/(sqrt(L_in+L_out))
    return np.random.rand(L_out, L_in + 1) * (2 * theta_init) - theta_init
'''
import numpy as np


def randInitializeWeights(l_in, l_out):
    # You need to return the following variable correctly
    w = np.zeros((l_out, 1 + l_in))

    # ===================== Your Code Here =====================
    # Instructions : Initialize w randomly so that we break the symmetry while
    #                training the neural network
    #
    # Note : The first column of w corresponds to the parameters for the bias unit
    #
    ep_init = 0.08
    w = np.random.rand(l_out, 1 + l_in) * (2 * ep_init) - ep_init
    # ===========================================================

    return w