【Machine Learning】【Andrew Ng】- 编程题(Week 5)

本文详细介绍神经网络的训练过程,包括权重随机初始化、前向传播、成本函数计算、反向传播等核心步骤,并通过实例演示正则化成本函数及梯度计算方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Training a neural network:

  1. Randomly initialize weights
    For gradient descent and advanced optimization method, need initial value for Theta;
    初始化为0的坏处:
    After each update, parameters corresponding to inputs going into each two hidden units are identical;
% Randomly initialize the weights to small values
epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init − epsilon_init;
  1. Implement forward propagation
    这里写图片描述
    这里写图片描述
  2. Implement code to compute cost function
    这里写图片描述
  3. Implement backpropagation to compute partial derivation
    这里写图片描述
    这里写图片描述
    这里写图片描述
    这里写图片描述
    这里写图片描述
  4. Using checking to compare analyze estimation and numerical estimation of gradient
    这里写图片描述
  5. Using gradient descent or advanced optimization method with backpropagation to try to minimize cost function J.

Exercise:
Part 1: Feedforward and Cost Function:

    X = [ones(size(X,1),1) X];
    for i = 1:m
        Z2(:,i) = Theta1*X(i,:)';
        temp(:,i) = sigmoid(Z2(:,i));
        a2(:,i) = [1; temp(:,i)];
        Z3(:,i) = Theta2*a2(:,i);
        a3(:,i) = sigmoid(Z3(:,i));
        yy = zeros(1,num_labels);% creat a new matrix to denote y(i)
        yy(y(i)) = 1; %encode the y(i), from 10 to 0000000001
        yyy(i,:) = yy; % yyy denote the all y after encoding
        J = J +sum(-yy'.*log(a3(:,i))-(1-yy').*log(1-a3(:,i)));
    end

Part 2: Regularized Cost Function

J = J + lambda/(2*m)*(sum(sum(Theta1.^2))+sum(sum(Theta2.^2)));
J = J - lambda/(2*m)*(sum(Theta1(:,1).^2)+sum(Theta2(:,1).^2));

Part 3: Sigmoid Gradient

g = sigmoid(z).*(1-sigmoid(z));

Part 4: Neural Network Gradient (Backpropagation)

%%%%%%%%%back propagation
%%%%%attention to different a and Z for diferrent 
Analytical Gradient:
D1 = 0;
D2 = 0;
for i = 1:m
    delta3 = a3(:,i) - yyy(i,:)';
    delta2 = Theta2'*delta3.*sigmoidGradient([1;Z2(:,i)]);
    delta2 = delta2(2:end);
    D2 = D2 + delta3*transpose(a2(:,i));
    D1 = D1 + delta2*(X(i,:)); 
end
Theta1_grad = D1/m;
Theta2_grad = D2/m;

Numerical Gradient:

sigma = 0.0000001;
for k = 1:size(Theta1,1)
    for j = 1:size(Theta1,2)
        Theta1_1 = Theta1;
        Theta1_1(k,j) = Theta1(k,j)+ sigma;
        Theta1_2 = Theta1;
        Theta1_2(k,j) = Theta1(k,j)- sigma;
        J2 = 0;
        J3 = 0;
        for i = 1:m
             a2 = sigmoid(X(i,:)*Theta1_1');
             a2 = [ones(1,size(a2,1)) a2];
             a3 = sigmoid(a2*Theta2');
             yy = zeros(1,num_labels);
             yy(y(i)) = 1;
             J2 = J2 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
             a2 = sigmoid(X(i,:)*Theta1_2');
             a2 = [ones(1,size(a2,1)) a2];
             a3 = sigmoid(a2*Theta2');
             J3 = J3 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
        end
        J2 = 1/m*J2;
        J3 = 1/m*J3;
        Theta1_grad(k,j) = (J2-J3)/(2*sigma);
        if j>1
            Theta1_grad(k,j) = Theta1_grad(k,j)+ lambda/m*Theta1(k,j);
        end
    end
end

for k = 1:size(Theta2,1)
    for j = 1:size(Theta2,2)
        Theta2_1 = Theta2;
        Theta2_1(k,j) = Theta2(k,j)+ sigma;
        Theta2_2 = Theta2;
        Theta2_2(k,j) = Theta2(k,j)- sigma;
        J2 = 0;
        J3 = 0;
        for i = 1:m
             a2 = sigmoid(X(i,:)*Theta1');
             a2 = [ones(1,size(a2,1)) a2];
             a3 = sigmoid(a2*Theta2_1');
             yy = zeros(1,num_labels);
             yy(y(i)) = 1;
             J2 = J2 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
             a2 = sigmoid(X(i,:)*Theta1');
             a2 = [ones(1,size(a2,1)) a2];
             a3 = sigmoid(a2*Theta2_2');
             J3 = J3 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
        end
        J2 = 1/m*J2;
        J3 = 1/m*J3;
        Theta2_grad(k,j) = (J2-J3)/(2*sigma);
        if j>1
            Theta2_grad(k,j) = Theta2_grad(k,j)+ lambda/m*Theta2(k,j);
        end
    end
end

Part 5: Regularized Gradient

%%%%%%%%%regularized Neural Network
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda/m*Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda/m*Theta2(:,2:end);

有空回来补图

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值