逻辑回归在梯度下降网络中的注意点（随笔）

本文链接：https://blog.youkuaiyun.com/qq_17816517/article/details/78712031

1.cost function
J= -1 * sum( y .* log( sigmoid(X*theta) ) + (1 - y ) .* log( (1 - sigmoid(X*theta)) ) ) / m + lambda/(2*m) * temp' * temp;
grad = ( X' * (sigmoid(X*theta) - y ) )/ m + lambda/m * temp ;

2.反向传播计算，求theta(j)的偏导
%初始化theta偏导累加值
set △(l)(ij)=0.

For i=1 to n

set a(1)=x(i)
%前向传播计算a(l)
perform forward propagation to compute a(l) for l=1,2,3,...,L
%输出层的loss
using y(i),compute delta(L)=a(L)-y(i)
%隐藏层的loss
compte delta(L-1),delta(L-2),...,delta(2)
%建立△即误差的累加
△(l)(ij):=△(l)(ij)+a(l)(j)*dleta(l+1)(i)
%计算delta的偏导矩阵
D(l)(ij)=1/m *△(l)(ij)+λ∑θ(l)(ij) ..........(j≠0)

D(l)(ij)=1/m *△(l)(ij)......................(j=0)

3.关于thea的预处理，
thetaVec=[theta1(:);theta(:);theta(:)];/在function(thetaVec)作带入
reshape(thetaVec(1:110),10,11);/在function中重构做反向传播及theta偏导的计算

4.梯度检验:优化j(θ)的，让误差更加准确
octave:gradApprox=(J(theta+EPSILON)-J(theta-EPSILON))/(2*EPSILON)

for i =1:n
thetaPlus =theta;
thetaPlus(i) =thetaplus(i)+ EPSILON;
thetaMinus= theta;
thetaMinus(i)=thetaMinus(i)-EPSILON;
gradApprox=(J(theta+EPSILON)-J(theta-EPSILON))/(2*EPSILON);
end;

Check that gradApprox~Dvec(back-prop的梯度) ------if 1 correct costsfunction/Dvec

后续会用python加数据实现以上核心步骤