Reference:
[1]http://blog.youkuaiyun.com/songrotek/article/details/41310861
[2]http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/
在计算loss损失的时候,只需要对于需要预测的类别进行loss的计算,而bp的时候则需要考虑到错误的类别。
Reached Maximum Number of Iterations
Optimization took 69.236617 seconds.
Training accuracy: 94.4%
Test accuracy: 92.1%
function [f,g] = softmax_regression(theta, X,y)
%
% Arguments:
% theta - A vector containing the parameter values to optimize.
% In minFunc, theta is reshaped to a long vector. So we need to
% resize it to an n-by-(num_classes-1) matrix.
% Recall that we assume theta(:,num_classes) = 0.
%
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The label for each example. y(j) is the j'th example's label.
%
m=size(X,2);
n=size(X,1);
% theta is a vector; need to reshape to n x num_classes.
theta=reshape(theta, n, []);
num_classes=size(theta,2)+1;
% initialize objective value and gradient.
f = 0;
g = zeros(size(theta));
%
% TODO: Compute the softmax objective function and gradient using vectorized code.
% Store the objective function value in 'f', and the gradient in 'g'.
% Before returning g, make sure you form it back into a vector with g=g(:);
%
%%% YOUR CODE HERE %%%
theta = [theta,zeros(785,1)];
predict = exp(theta' * X);
predict = bsxfun(@rdivide,predict,sum(predict));
I = sub2ind(size(predict),y,1:size(predict,2));
f = f - sum(log(predict(I)));
%f = f + sum(log(1-predict(setdiff(1:m*n,I))));
delta = full(sparse(y,1:m,1))-predict;
g = -X * delta';
g = g(:,1:9);
g=g(:); % make gradient a vector for minFunc