softmax-练习（UFLDL）

最新推荐文章于 2023-02-23 13:56:02 发布

原创最新推荐文章于 2023-02-23 13:56:02 发布 · 819 阅读

1 ·

CC 4.0 BY-SA版权

机器学习与数据挖掘专栏收录该内容

29 篇文章

订阅专栏

本文详细介绍了UFLDL中softmax算法的工作原理，并通过两个关键公式解释了损失函数及参数梯度求导过程。此外，还提供了实现该算法的Matlab代码样例。

UFLDL-softmax章节详细推导了算法原理。最终凝结为两个重要公式：
损失函数和参数梯度求导公式：
这里写图片描述
1.损失函数 $J(\theta)$ 计算方法为：
对于样本 $i$ 为例：
1）计算样本 $i$ 在各个类别 $j$ 上的输入 $\theta_j^Tx^{(i)}$
2）减去结果（1）中的最大值
3）求取结果（2）中的exp值
4）归一化结果（3），得到各个类别上的概率
5）求log
得出的结果便是样本 $i$ 在相应类别上的概率值（对数似然情况下）；将所有样本按照1~5计算概率并求和便得到损失函数 $J(\theta)$ .
2. $\theta_j$ 求导公式，直观的理解为，样本的线性组合，其中系数值跟类别相关：
1）计算样本 $i$ 属于类别 $j$ 的概率。即： $p(y^{(i)}=j|x^{(i)};\theta)=\frac{e^{\theta_j^Tx^{(i)}}}{\sum_{l=1}^{k}e^{\theta_l^Tx^{(i)}}}$
2) 如果样本 $i$ 属于类别 $j$ ，系数值为 $1-p$ ,否则系数为 $-p$ 。
3）将所有样本线性组合，得到 $\theta_j$ 的梯度。
其中， $\theta_j$ 为向量，其他类别计算方法与此一致。

主要代码如下：
softmaxExercise.m

function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)

% numClasses - the number of classes 
% inputSize - the size N of the input vector
% lambda - weight decay parameter
% data - the N x M input matrix, where each column data(:, i) corresponds to
%        a single test set
% labels - an M x 1 matrix containing the labels corresponding for the input data
%

% Unroll the parameters from theta
theta = reshape(theta, numClasses, inputSize); %权重值，第i行为第i类别的所有权重

numCases = size(data, 2); % 训练样本的个数

groundTruth = full(sparse(labels, 1:numCases, 1)); %真实类别标签
cost = 0;

thetagrad = zeros(numClasses, inputSize); % 权重值的梯度，第i行为第i类别的参数梯度值

%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Compute the cost and gradient for softmax regression.
%                You need to compute thetagrad and cost.
%                The groundTruth matrix might come in handy.

M = bsxfun(@minus,theta*data,max(theta*data, [], 1)); %减去最大值，防止溢出  
M = exp(M); 
p = bsxfun(@rdivide, M, sum(M));    
cost = -1/numCases * groundTruth(:)' * log(p(:)) + lambda/2 * sum(theta(:) .^ 2);
thetagrad = -1/numCases * (groundTruth - p) * data' + lambda * theta;


% ------------------------------------------------------------------
% Unroll the gradient matrices into a vector for minFunc
grad = [thetagrad(:)];
end

softmaxPredict.m

function [pred] = softmaxPredict(softmaxModel, data)

% softmaxModel - model trained using softmaxTrain
% data - the N x M input matrix, where each column data(:, i) corresponds to
%        a single test set
%
% Your code should produce the prediction matrix 
% pred, where pred(i) is argmax_c P(y(c) | x(i)).

% Unroll the parameters from theta
theta = softmaxModel.optTheta;  % this provides a numClasses x inputSize matrix
pred = zeros(1, size(data, 2));

%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Compute pred using theta assuming that the labels start 
%                from 1.


[nop, pred] = max(theta * data);


% ---------------------------------------------------------------------

end