UFLDL-softmax章节详细推导了算法原理。最终凝结为两个重要公式:
损失函数和参数梯度求导公式:
1.损失函数J(θ)计算方法为:
对于样本i为例:
1)计算样本
2)减去结果(1)中的最大值
3)求取结果(2)中的exp值
4)归一化结果(3),得到各个类别上的概率
5)求log
得出的结果便是样本i在相应类别上的概率值(对数似然情况下);将所有样本按照1~5计算概率并求和便得到损失函数
2. θj求导公式,直观的理解为,样本的线性组合,其中系数值跟类别相关:
1)计算样本i属于类别
2) 如果样本i属于类别
3)将所有样本线性组合,得到θj的梯度。
其中,θj为向量,其他类别计算方法与此一致。
主要代码如下:
softmaxExercise.m
function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)
% numClasses - the number of classes
% inputSize - the size N of the input vector
% lambda - weight decay parameter
% data - the N x M input matrix, where each column data(:, i) corresponds to
% a single test set
% labels - an M x 1 matrix containing the labels corresponding for the input data
%
% Unroll the parameters from theta
theta = reshape(theta, numClasses, inputSize); %权重值,第i行为第i类别的所有权重
numCases = size(data, 2); % 训练样本的个数
groundTruth = full(sparse(labels, 1:numCases, 1)); %真实类别标签
cost = 0;
thetagrad = zeros(numClasses, inputSize); % 权重值的梯度,第i行为第i类别的参数梯度值
%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute the cost and gradient for softmax regression.
% You need to compute thetagrad and cost.
% The groundTruth matrix might come in handy.
M = bsxfun(@minus,theta*data,max(theta*data, [], 1)); %减去最大值,防止溢出
M = exp(M);
p = bsxfun(@rdivide, M, sum(M));
cost = -1/numCases * groundTruth(:)' * log(p(:)) + lambda/2 * sum(theta(:) .^ 2);
thetagrad = -1/numCases * (groundTruth - p) * data' + lambda * theta;
% ------------------------------------------------------------------
% Unroll the gradient matrices into a vector for minFunc
grad = [thetagrad(:)];
end
softmaxPredict.m
function [pred] = softmaxPredict(softmaxModel, data)
% softmaxModel - model trained using softmaxTrain
% data - the N x M input matrix, where each column data(:, i) corresponds to
% a single test set
%
% Your code should produce the prediction matrix
% pred, where pred(i) is argmax_c P(y(c) | x(i)).
% Unroll the parameters from theta
theta = softmaxModel.optTheta; % this provides a numClasses x inputSize matrix
pred = zeros(1, size(data, 2));
%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute pred using theta assuming that the labels start
% from 1.
[nop, pred] = max(theta * data);
% ---------------------------------------------------------------------
end
本文详细介绍了UFLDL中softmax算法的工作原理,并通过两个关键公式解释了损失函数及参数梯度求导过程。此外,还提供了实现该算法的Matlab代码样例。
732

被折叠的 条评论
为什么被折叠?



