Part Ⅱ

Part Ⅱ

2.1 The words list

  • linear regression 线性回归
  • gradient descent 梯度下降
  • learning rate 学习率
  • stochastic gradient descent 随机梯度下降法
  • Matrix derivatives 矩阵求导

2.2 Gradient descent concept

Gradient descent is kind of the iterative method, which can solve the least squares problems(linear or nonlinear).Gradient descent is one of the most commonly used method for solving the model parameters in machine learning algorithms, also named unconstrained optimization problems. Another method is the least squares method. We can get the minimize of loss functions and model parameters by gradient descent to solve step by step when we solve the minimize of the loss functions. In contrast, we use the gradient rise method to iterate if we need the maximum of the loss functions. In the machine learning, developing two methods based on the most essential gradient descent, stochastic gradient descent method and batch gradient descent method respectively.

The fundamental formula:

θj:=θj−α∂∂θjJ(θ)\theta_{j}:= \theta_{j} - \alpha\frac{\partial}{\partial\theta_{j}}J(\theta)θj:=θjαθjJ(θ).

The J(θ)J(\theta)J(θ) is cost function defined:

J(θ)=12(hθ(x)−y)2J(\theta) = \frac{1}{2}{(h_{\theta}(x) - y)^2}J(θ)=21(hθ(x)y)2.

because h(θ)=Σi=1nθixih(\theta) = \Sigma^n_{i=1}\theta_{i}x_{i}h(θ)=Σi=1nθixi

so after simplification we get θj:=θj+α(y(i)−hθ(x(i)))xj(i)\theta_{j}:= \theta_{j} + \alpha(y^{(i)} - h_{\theta}(x^{(i)}))x^{(i)}_jθj:=θj+α(y(i)hθ(x(i)))xj(i).

Repeat until convergence{

θj:=θj+αΣi=1m(y(i)−hθ(x(i))xj(i)\theta_{j}:= \theta_{j} + \alpha\Sigma^m_{i=1}(y^{(i)} - h_{\theta(x^{(i)})}x^{(i)}_jθj:=θj+αΣi=1m(y(i)hθ(x(i))xj(i). >for every j

}

This algorithm is called batch gradient descent .

Loop{

for i = 1 to m{
θj:=θj+α(y(i)−hθ(x(i)))xj(i)\theta_{j}:= \theta_{j} + \alpha(y^{(i)} - h_{\theta}(x^{(i)}))x^{(i)}_jθj:=θj+α(y(i)hθ(x(i)))xj(i).

}
}

This algorithm is called stochastic gradient descent method.

2.3 The normal equation of the least square method

the training example’s input values in its rows:

X=[—(x(1))T——(x(2))T—⋮—(x(m))T—]X = \left[ \begin{matrix} —(x^{(1)})^{T}— \\ —(x^{(2)})^{T}— \\ \vdots \\ —(x^{(m)})^{T}— \end{matrix}\right]X=(x(1))T(x(2))T(x(m))T.

Also,let y⃗\vec yy be the m-dimensional vector containing all the target values from the training set:

y⃗=[y(1)y(2)⋮y(m)]\vec y = \left[ \begin{matrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)}\end{matrix}\right]y=y(1)y(2)y(m).

Now,since hθ(x(i))=(x(i))Tθh_{\theta}(x^{(i)}) = (x^{(i)})^{T}\thetahθ(x(i))=(x(i))Tθ,we can easily verify that

Xθ−y⃗=[(x(1))Tθ⋮(x(m))Tθ]−[y(1)⋮y(m)]X\theta - \vec y = \left[ \begin{matrix} (x^{(1)})^{T}\theta \\ \vdots\\ (x^{(m)})^{T}\theta\end{matrix}\right] - \left[ \begin{matrix} y^{(1)} \\ \vdots \\ y^{(m)}\end{matrix}\right]Xθy=(x(1))Tθ(x(m))Tθy(1)y(m).

=[hθ(x(1))−y(1)⋮hθ(x(m))−y(m)]=\left[ \begin{matrix} h_\theta(x^{(1)}) - y^{(1)} \\ \vdots \\ h_\theta(x^{(m)}) - y^{(m)}\end{matrix}\right]=hθ(x(1))y(1)hθ(x(m))y(m).

Thus, using the fact that for a vector zzz, we have that zTz=Σizi2:z^Tz = \Sigma_iz^2_{i}:zTz=Σizi2:

12(Xθ−y⃗)T(Xθ−y⃗)=12∑i=1m(hθ(x(i))−y(i))2\frac{1}{2}{(X\theta - \vec y)^{T}(X\theta - \vec y)} = \frac{1}{2} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^221(Xθy)T(Xθy)=21i=1m(hθ(x(i))y(i))2.

=J(θ)=J(\theta)=J(θ).

After simplification,we can the normal equation:

XTXθ=XTy⃗X^TX\theta = X^T\vec yXTXθ=XTy.

θ=(XTX)(−1)XTy⃗\theta=(X^TX)^{(-1)}X^T\vec yθ=(XTX)(1)XTy.

PS:The least square method have essential difference with the gradient descent method. The least square method is to solve the global optimal solution and the gradient descent is to solve the local optimal solution.

2.4 summarize

The main content of this section is the difference between the gradient descent method and the least square method.For the gradient descent method,main methods are batch gradient descent and stochastic gradient descent. In difference places, these two methods have different performances.Meanwhile,we need notice the essence between the least square and the gradient descent.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值