L1正则项与稀疏性

最新推荐文章于 2023-08-22 21:09:32 发布

原创

最新推荐文章于 2023-08-22 21:09:32 发布 · 619 阅读

0 ·

CC 4.0 BY-SA版权

本文从几何、微积分和贝叶斯先验三个角度探讨L1正则化导致模型参数稀疏性的原理。解空间形状分析表明，L1约束使最优解倾向于边界，L1范数形成角形区域促进稀疏；微积分角度通过目标函数导数解释，L1正则化使损失函数在非零参数处单调，最小值出现在原点；贝叶斯先验视角，拉普拉斯先验比高斯先验更倾向于参数为0，增强稀疏性。

题目（164）：L1正则化使得模型参数具有稀疏性的原理是什么？

回答角度：

几何角度，即解空间形状
微积分角度，对带L1限制的目标函数求导
贝叶斯先验

解空间形状

Step 1. 正则条件和限制条件的等价性
Step 2. L1范数与L2范数的几何形状
Step 3. 如果原问题目标函数的最优解不在解空间内，那么约束条件下的最优解一定是在解空间的边界上。
$slackness]\textcolor{red}{\text{[复习KKT, complementary slackness]}}$

微积分、函数叠加

损失函数加入L1正则后，目标函数变为 $J(θ)=L(θ)+c∥θ∥1J(\bm \theta) = L(\bm \theta) + c \|\bm \theta\|_1$ 。When $θ>0\bm \theta>0$ , the gradient of $\|\bm \theta\|_1$ equals $c$ ; when $θ<0\bm \theta<0$ , the gradient of $\|\bm \theta\|_1$ equals $- c$ . Therefore, if the gradient of $L(θ)L(\bm \theta)$ lies within $(- c, c)$ , the gradient of $J(θ)J(\bm \theta)$ is always negative for $θ<0\bm \theta<0$ , indicating that $J(θ)J(\bm \theta)$ is monotonically decreasing on the left of the origin; its gradient is always positive for $θ>0\bm \theta>0$ , indicating monotonic increase on the right of the origin. Therefore, the minimum takes place at