教程地址:http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial
Exercise地址:http://deeplearning.stanford.edu/wiki/index.php/Exercise:Learning_color_features_with_Sparse_Autoencoders
代码
sparseAutoencoderCost.m computeNumericalGradient.m (Exercise:Sparse Autoencoder )
Step 0: Initialization ——代码
Step 1: Modify your sparse autoencoder to use a linear decoder——sparseAutoencoderLinearCost.m
将sparseAutoencoderCost.m 拷贝到该练习所在目录下,并将其命名为sparseAutoencoderLinearCost.m,然后根据相应的linear decoder的公式修改cost和gradient。
修改a3和delta3
%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,
% and the corresponding gradients W1grad, W2grad, b1grad, b2grad.
%
% W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.
% Note that W1grad has the same dimensions as W1, b1grad has the same dimensions
% as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with
% respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b)
% with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term
% [(1/m) \Delta W^{(1)} + \lambda W^{(1)}] in the last block of pseudo-code in Section 2.2
% of the lecture notes (and similarly for W2grad, b1grad, b2grad).
%
% Stated differently, if we were using batch gradient descent to optimize the parameters,
% the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2.
%
numSample = size(data,2);
%%正向传播的实现,计算各层激活值
z2 = W1 * data + repmat(b1,1,numSample);
a2 = sigmoid(z2); %hidden层,sigmoid激活函数
z3 = W2 * a2 + repmat(b2,1,numSample);
a3 = z3; %output层,线性激励函数
%计算隐藏神经元的激活度
averageActiv = (1/numSample)*sum(a2,2); %sum(a,2)将a中每一行相加生成列向量
%计算惩罚因子
sparity = sum(sparsityParam.*log(sparsityParam./averageActiv)+(1-sparsityParam).*log((1-sparsityParam)./(1-averageActiv)));
%计算delta2中多的项
sparsityDelta = -(sparsityParam./averageActiv)+(1-sparsityParam)./(1-averageActiv);
%计算 error term 残差
delta3 = -(data-a3); %输出层,线性编码器
delta2 = ((W2'*delta3)+beta*repmat(sparsityDelta,1,numSample)).*(a2.*(1-a2)); %hidden layer
%计算整体代价函数(均方差项+权重衰减项)
% cost =(1/numSample)*(1/2)*sum(sum((a3-data).*(a3-data)))+ (lambda/2)*(sum(sum(W1.*W1))+sum(sum((W2.*W2)))) + beta*aparsityParameter ;
cost =(1/numSample)*(1/2)*sum(sum((a3-data).*(a3-data)))+ (lambda/2)*(sum(sum(W1.^2))+sum(sum((W2.^2)))) + beta*sparity ;
%计算偏导数set W1grad to be the partial derivative of J_sparse(W,b) with
% respect to W1
W1grad = (1/numSample)* (delta2 * data') + lambda.*W1;
W2grad = (1/numSample)* (delta3 * a2') + lambda.*W2;
b1grad = (1/numSample)* sum(delta2,2);
b2grad = (1/numSample)* sum(delta3,2);
%-------------------------------------------------------------------
Step 2: Learn features on small patches——代码已给