回归系数更新公式 weigths=weights+alpha×dataMatrix.transpose()×errorweigths=weights+alpha \times dataMatrix.transpose() \times errorweigths=weights+alpha×dataMatrix.transpose()×error 公式原理
损失函数定义
J(θ)=12m∑i=1m(yi^−yi)2=12m∑i=1m(hθ(xi)−yi)2
J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}(\widehat{y_i}-y_i)^{2}=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)^{2}
J(θ)=2m1i=1∑m(yi−yi)2=2m1i=1∑m(hθ(xi)−yi)2
即为预测函数值(y^\widehat{y}y)与真实值(yyy)差的平方和,其中θ\thetaθ为此时预测函数所使用的回归系数(向量)
梯度上升(下降)法回归系数更新公式
书中所提到梯度上升算法迭代公式为:
w=w±α▽wf(w)
w=w±\alpha\triangledown_wf(w)
w=w±α▽wf(w)
即:
θj=θj±α∂∂θjJ(θ)
\theta_j=\theta_j±\alpha\frac{\partial}{\partial\theta_j}J(\theta)
θj=θj±α∂θj∂J(θ)
对∂∂θjJ(θ)\frac{\partial}{\partial\theta_j}J(\theta)∂θj∂J(θ)推导:
∂∂θjJ(θ)=∂∂θj12(hθ(x)−y)2=12×2×(hθ(x)−y)×∂∂θj(hθ(x)−y)=(hθ(x)−y)×∂∂θj((θ0x0+θ1x1+⋯+θnxn)−y)=(hθ(x)−y)×xj
\begin{aligned}
\frac{\partial}{\partial\theta_j}J(\theta)&=\frac{\partial}{\partial\theta_j}\frac{1}{2}(h_\theta(x)-y)^{2}\\
&=\frac{1}{2}\times2\times(h_\theta(x)-y) \times \frac {\partial}{\partial\theta_j}(h_\theta(x)-y)\\
&=(h_\theta(x)-y)\times \frac{\partial}{\partial \theta_j}((\theta_0x_0+\theta_1x_1+\cdots+\theta_nx_n)-y)\\
&=(h_\theta(x)-y) \times x_j
\end{aligned}
∂θj∂J(θ)=∂θj∂21(hθ(x)−y)2=21×2×(hθ(x)−y)×∂θj∂(hθ(x)−y)=(hθ(x)−y)×∂θj∂((θ0x0+θ1x1+⋯+θnxn)−y)=(hθ(x)−y)×xj
则回归系数θj\theta_jθj:
θj=θj±α×(hθ(x)−y)xj
\theta_j = \theta_j ±\alpha \times (h_\theta (x) - y)x_j
θj=θj±α×(hθ(x)−y)xj
回归向量θ\thetaθ:
θ=θ±α×(hθ(x)−y)x
\theta = \theta ±\alpha \times (h_\theta (x) - y)x
θ=θ±α×(hθ(x)−y)x
θ\thetaθ(回归系数(向量))即为书中所提到的weightsweightsweights,α\alphaα为步长,xxx为输入数据dataMatrixdataMatrixdataMatrix
在此以θ=[θ0θ1]\theta=\begin{bmatrix}\theta_0 \\ \theta_1 \end{bmatrix}θ=[θ0θ1]为例,则可得更新量为:
[θ0θ1]→[θ0±α1m∑i=1m(hθ(xi)−yi)θ1±α1m∑i=1m(hθ(xi)−yi)]
\begin{bmatrix}\theta_0 \\ \theta_1 \end{bmatrix}\to\begin{bmatrix}\theta_0±\alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i) \\ \theta_1±\alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i) \end{bmatrix}
[θ0θ1]→[θ0±αm1∑i=1m(hθ(xi)−yi)θ1±αm1∑i=1m(hθ(xi)−yi)]