梯度下降算法:
Repeat
{
θ
j
=
θ
j
−
α
∂
∂
θ
j
J
(
θ
0
,
θ
1
.
.
.
θ
n
)
\theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1...\theta_n)
θj=θj−α∂θj∂J(θ0,θ1...θn)
}simultaneously update for every j=0,1…n)
θ
j
=
θ
j
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\theta_j=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}
θj=θj−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
Feature Scaling以及Mean normalizaition
α
\alpha
α太大:slow convergence
α
\alpha
α太小:J(
θ
\theta
θ) mat not decrease on every iteration,may not converge
尝试不同的
α
\alpha
α,绘制J(
θ
\theta
θ)随迭代次数变化的曲线
polynominal regression(多项式回归)
Normal equation(正规方程)
∂ ∂ θ j J ( θ ) = 0 \frac{\partial}{\partial\theta_j}J(\theta)=0 ∂θj∂J(θ)=0 for every j
Gradient Descent 和 Normal Equation各自的优缺点