Training a neural network:
- Randomly initialize weights
For gradient descent and advanced optimization method, need initial value for Theta;
初始化为0的坏处:
After each update, parameters corresponding to inputs going into each two hidden units are identical;
% Randomly initialize the weights to small values
epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init − epsilon_init;
- Implement forward propagation
- Implement code to compute cost function
- Implement backpropagation to compute partial derivation
- Using checking to compare analyze estimation and numerical estimation of gradient
- Using gradient descent or advanced optimization method with backpropagation to try to minimize cost function J.
Exercise:
Part 1: Feedforward and Cost Function:
X = [ones(size(X,1),1) X];
for i = 1:m
Z2(:,i) = Theta1*X(i,:)';
temp(:,i) = sigmoid(Z2(:,i));
a2(:,i) = [1; temp(:,i)];
Z3(:,i) = Theta2*a2(:,i);
a3(:,i) = sigmoid(Z3(:,i));
yy = zeros(1,num_labels);% creat a new matrix to denote y(i)
yy(y(i)) = 1; %encode the y(i), from 10 to 0000000001
yyy(i,:) = yy; % yyy denote the all y after encoding
J = J +sum(-yy'.*log(a3(:,i))-(1-yy').*log(1-a3(:,i)));
end
Part 2: Regularized Cost Function
J = J + lambda/(2*m)*(sum(sum(Theta1.^2))+sum(sum(Theta2.^2)));
J = J - lambda/(2*m)*(sum(Theta1(:,1).^2)+sum(Theta2(:,1).^2));
Part 3: Sigmoid Gradient
g = sigmoid(z).*(1-sigmoid(z));
Part 4: Neural Network Gradient (Backpropagation)
%%%%%%%%%back propagation
%%%%%attention to different a and Z for diferrent
Analytical Gradient:
D1 = 0;
D2 = 0;
for i = 1:m
delta3 = a3(:,i) - yyy(i,:)';
delta2 = Theta2'*delta3.*sigmoidGradient([1;Z2(:,i)]);
delta2 = delta2(2:end);
D2 = D2 + delta3*transpose(a2(:,i));
D1 = D1 + delta2*(X(i,:));
end
Theta1_grad = D1/m;
Theta2_grad = D2/m;
Numerical Gradient:
sigma = 0.0000001;
for k = 1:size(Theta1,1)
for j = 1:size(Theta1,2)
Theta1_1 = Theta1;
Theta1_1(k,j) = Theta1(k,j)+ sigma;
Theta1_2 = Theta1;
Theta1_2(k,j) = Theta1(k,j)- sigma;
J2 = 0;
J3 = 0;
for i = 1:m
a2 = sigmoid(X(i,:)*Theta1_1');
a2 = [ones(1,size(a2,1)) a2];
a3 = sigmoid(a2*Theta2');
yy = zeros(1,num_labels);
yy(y(i)) = 1;
J2 = J2 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
a2 = sigmoid(X(i,:)*Theta1_2');
a2 = [ones(1,size(a2,1)) a2];
a3 = sigmoid(a2*Theta2');
J3 = J3 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
end
J2 = 1/m*J2;
J3 = 1/m*J3;
Theta1_grad(k,j) = (J2-J3)/(2*sigma);
if j>1
Theta1_grad(k,j) = Theta1_grad(k,j)+ lambda/m*Theta1(k,j);
end
end
end
for k = 1:size(Theta2,1)
for j = 1:size(Theta2,2)
Theta2_1 = Theta2;
Theta2_1(k,j) = Theta2(k,j)+ sigma;
Theta2_2 = Theta2;
Theta2_2(k,j) = Theta2(k,j)- sigma;
J2 = 0;
J3 = 0;
for i = 1:m
a2 = sigmoid(X(i,:)*Theta1');
a2 = [ones(1,size(a2,1)) a2];
a3 = sigmoid(a2*Theta2_1');
yy = zeros(1,num_labels);
yy(y(i)) = 1;
J2 = J2 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
a2 = sigmoid(X(i,:)*Theta1');
a2 = [ones(1,size(a2,1)) a2];
a3 = sigmoid(a2*Theta2_2');
J3 = J3 +sum(-yy.*log(a3)-(1-yy).*log(1-a3));
end
J2 = 1/m*J2;
J3 = 1/m*J3;
Theta2_grad(k,j) = (J2-J3)/(2*sigma);
if j>1
Theta2_grad(k,j) = Theta2_grad(k,j)+ lambda/m*Theta2(k,j);
end
end
end
Part 5: Regularized Gradient
%%%%%%%%%regularized Neural Network
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda/m*Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda/m*Theta2(:,2:end);
有空回来补图