机器学习05:正则化
1 线性回归的正则化
1.1 损失函数
J ( θ ) = 1 2 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 0 n θ j 2 ] J(\theta)=\frac{1}{2m}\left[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=0}^n\theta_j^2 \right] J(θ)=2m1[i=1∑m(hθ(x(i))−y(i))2+λj=0∑nθj2]
1.2 梯度下降法
R
e
p
e
a
t
:
Repeat:
Repeat:
θ
0
:
=
θ
0
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
θ
j
:
=
θ
j
−
α
[
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
−
λ
m
θ
j
]
:
=
θ
j
(
1
−
α
λ
m
)
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}\\ \begin{aligned}\theta_j&:=\theta_j-\alpha\left[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}-\frac{\lambda}{m}\theta_j\right]\\&:=\theta_j(1-\alpha\frac{\lambda}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}\end{aligned}
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)θj:=θj−α[m1i=1∑m(hθ(x(i))−y(i))xj(i)−mλθj]:=θj(1−αmλ)−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
1.3 正规方程法
X
=
[
(
x
(
1
)
)
T
⋅
⋅
⋅
(
x
(
m
)
)
T
]
,
y
=
[
y
(
1
)
⋅
⋅
⋅
y
(
m
)
]
X=\begin{bmatrix}(x^{(1)})^T\\···\\ (x^{(m)})^T\end{bmatrix},\quad y=\begin{bmatrix}y^{(1)}\\···\\ y^{(m)}\end{bmatrix}
X=⎣⎡(x(1))T⋅⋅⋅(x(m))T⎦⎤,y=⎣⎡y(1)⋅⋅⋅y(m)⎦⎤
解得:
θ
=
(
X
T
X
+
λ
[
0
0
⋯
0
0
1
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
1
]
)
−
1
X
T
y
\theta=(X^TX+\lambda\begin{bmatrix} 0 & 0 & \cdots & 0\\ 0& 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1\end{bmatrix})^{-1}X^Ty
θ=(XTX+λ⎣⎢⎢⎢⎡00⋮001⋮0⋯⋯⋱⋯00⋮1⎦⎥⎥⎥⎤)−1XTy
1.4 矩阵不可逆问题的解决
正则化之后,线性回归中的不可逆问题将不存在。
2 Logistic 回归的正则化
2.1 损失函数
J ( θ ) = − 1 m [ ∑ i = 1 m y ( i ) l o g h θ ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J(\theta)=-\frac{1}{m}\left[\sum_{i=1}^my^{(i)}log\,h_\theta(x^{(i)})+(1-y^{(i)})log\,(1-h_\theta(x^{(i)}))\right]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2 J(θ)=−m1[i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2
2.2 梯度下降
R
e
p
e
a
t
:
Repeat:
Repeat:
θ
0
:
=
θ
0
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
θ
j
:
=
θ
j
−
α
[
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
−
λ
m
θ
j
]
\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}\\\theta_j:=\theta_j-\alpha\left[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}-\frac{\lambda}{m}\theta_j\right]
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)θj:=θj−α[m1i=1∑m(hθ(x(i))−y(i))xj(i)−mλθj]
其中:
h
θ
(
x
)
=
1
1
−
e
−
θ
T
x
h_\theta(x)=\frac{1}{1-e^{-\theta^Tx}}
hθ(x)=1−e−θTx1