课堂学习笔记
机器学习—基础算法一
https://blog.youkuaiyun.com/fan2312/article/details/100854485
线性回归
- 高斯分布
- 最大似然估计
- 最小二乘法的本质
- 线性回归的目标函数: J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta) = \frac{1}{2} \sum^{m}_{i=1}(h_{\theta}(x^{(i)})-y^{(i)})^2 J(θ)=21∑i=1m(hθ(x(i))−y(i))2
- 将目标函数增加平方和损失:
J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 J(\theta) = \frac{1}{2} \sum^{m}_{i=1}(h_{\theta}(x^{(i)})-y^{(i)})^2 + \lambda \sum^{n}_{j=1}\theta^2_j J(θ)=21∑i=1m(hθ(x(i))−y(i))2+λ∑j=1nθj2 - 复杂度惩罚因子
- L2正则:防止过拟合
- Ridge:
- λ ∑ j = 1 n θ j 2 \lambda \sum^{n}_{j=1}\theta^2_j λ∑j=1nθj2
- LASSO:
- L1正则
- λ ∑ j = 1 n ∣ θ ∣ \lambda \sum^{n}_{j=1}|\theta| λ∑j=1n∣θ∣
- 可用于特征选择:最终 θ j \theta_j θj越小的特征越不重要
- Elastic Net:
- λ ( ρ ⋅ ∑ j = 1 n ∣ θ j ∣ + ( 1 − ρ ) ⋅ ∑ j = 1 n θ j 2 ) \lambda(\rho\cdot\sum^n_{j=1}|\theta_j| + (1-\rho)\cdot\sum^n_{j=1}\theta^2_j) λ(ρ⋅∑j=1n∣θj∣+(1−ρ)⋅∑j=1nθj2)
- 交叉验证:选择超参数 λ \lambda λ
梯度下降算法
J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta) = \frac{1}{2} \sum^{m}_{i=1}(h_{\theta}(x^{(i)})-y^{(i)})^2 J(θ)=21∑i=1m(hθ(x(i))−y(i))2
- 初始化 θ \theta θ
- 沿着负梯度方向迭代,更新后的 θ \theta θ使 J ( θ ) J(\theta) J(θ)更小
-
θ
=
θ
−
α
⋅
∂
J
(
θ
)
∂
θ
\theta=\theta - \alpha\cdot\frac{\partial J(\theta)}{\partial \theta}
θ=θ−α⋅∂θ∂J(θ)
- α : 学 习 率 、 步 长 \alpha:学习率、步长 α:学习率、步长
- 梯度方向
-
∂
∂
θ
j
J
(
θ
)
=
∂
∂
θ
j
1
2
(
h
θ
(
x
)
−
y
)
2
\frac{\partial}{\partial \theta_j}J(\theta) = \frac{\partial}{\partial \theta_j}\frac{1}{2}(h_{\theta}(x) -y)^2
∂θj∂J(θ)=∂θj∂21(hθ(x)−y)2
= 2 ⋅ 1 2 ( h θ ( x ) − y ) ⋅ ∂ ∂ θ j ( h θ ( x ) − y ) =2\cdot\frac{1}{2}(h_{\theta}(x)-y)\cdot\frac{\partial}{\partial\theta_j}(h_{\theta}(x)-y) =2⋅21(hθ(x)−y)⋅∂θj∂(hθ(x)−y)
= ( h θ ( x ) − y ) ⋅ ∂ ∂ θ j ( ∑ i = 0 n θ i x i − y ) =(h_{\theta}(x)-y)\cdot\frac{\partial}{\partial\theta_j}(\sum^{n}_{i=0}\theta_ix_i-y) =(hθ(x)−y)⋅∂θj∂(∑i=0nθixi−y)
= ( h θ ( x ) − y ) x j =(h_{\theta}(x)-y)x_j =(hθ(x)−y)xj
-
∂
∂
θ
j
J
(
θ
)
=
∂
∂
θ
j
1
2
(
h
θ
(
x
)
−
y
)
2
\frac{\partial}{\partial \theta_j}J(\theta) = \frac{\partial}{\partial \theta_j}\frac{1}{2}(h_{\theta}(x) -y)^2
∂θj∂J(θ)=∂θj∂21(hθ(x)−y)2
- 批量梯度下降(BGD)
- 重复直至收敛{
θ j : = θ j + α ∑ i = 1 m ( y i − h θ ( x ) ) \theta_j:=\theta_j +\alpha\sum^m_{i=1}(y^{i}-h_\theta(x)) θj:=θj+α∑i=1m(yi−hθ(x))
} - 所有样本更新一次
- 找到局部最小值即全局最小值
- 重复直至收敛{
- 随机梯度下降(SGD)
- Loop{
for i=1 to m, {
θ j : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta_j :=\theta_j+\alpha(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)} θj:=θj+α(y(i)−hθ(x(i)))xj(i)
}
} - 一个样本更新一次
- 有可能跳过局部最小值
- Loop{
- 优先选择SGD:速度快、有可能跳过局部最小值
- 折中:mini-batch GD
- 若干个样本更新梯度