LASSO近端梯度下降法Proximal Gradient Descent公式推导及代码_proximal gradident descent matlab代码-优快云博客

本文链接：https://blog.youkuaiyun.com/DS_agent/article/details/106340777

文章目录

LASSO by Proximal Gradient Descent

LASSO by Proximal Gradient Descent

Prepare:
准备：

from itertools import cycle
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import lasso_path, enet_path
from sklearn import datasets
from copy import deepcopy

X = np.random.randn(100,10)
y = np.dot(X,[1,2,3,4,5,6,7,8,9,10])

Proximal Gradient Descent Framework近端梯度下降算法框架

randomly set $\beta^{(0)}$ for iteration 0
For $k$ th iteration:
----Compute gradient $\nabla f(\beta^{(k-1)})$
----Set $\beta^{(k-1)} - \frac{1}{L} \nabla f(\beta^{(k-1)})$
----Update $\beta^{(k)} = \text{sgn}(z)\cdot \text{max}[|z|-\frac{\lambda}{L}, \; 0]$
----Check convergence: if yes, end algorithm; else continue update
Endfor

Here $f(\beta) = \frac{1}{2N}(Y-X\beta)^T (Y-X\beta)$ , and $\nabla f(\beta) = -\frac{1}{N}X^T(Y-X\beta)$ ,
where the size of $X$ , $Y$ , $\beta$ is $N\times p$ , $N\times 1$ , $p\times 1$ , which means $N$ samples样本 and $p$ features特征. Parameter $\ge 1$ can be chosen, and $\frac{1}{L}$ can be considered as step size步长.

Proximal Gradient Descent Details近端梯度下降细节推导

Consider optimization problem:
现考虑优化问题：
$\text{min}_x {f(x) + \lambda \cdot g(x)},$
where $x\in \mathbb{R}^{p\times 1}$ , $\in \mathbb{R}$ . And $f (x)$ is differentiable convex function, and $g (x)$ is convex but may not differentiable.
$f (x)$ 是可微凸函数， $g (x)$ 是凸函数但不一定可微。

For $f (x)$ , according to Lipschitz continuity, for $\forall x, y$ , there always exists a constant $L$ s.t.
对于 $f (x)$ ，根据利普希茨连续性，对于任意 $x, y$ ，一定存在常数 $L$ 使得满足
$\le L|y-x|.$
Then this problem can be solved using Proximal Gradient Descent.
可以用近似梯度下降来解决这种问题。