无约束问题的最小化

最新推荐文章于 2025-03-20 15:06:27 发布

原创最新推荐文章于 2025-03-20 15:06:27 发布 · 679 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#优化

优化专栏收录该内容

1 篇文章

订阅专栏

本文介绍了无约束凸优化问题的解决方法，特别是强凸函数的性质，包括梯度下降、最速下降法和牛顿法。通过分析梯度和Hessian矩阵，探讨了条件数在优化中的作用，并解释了如何利用条件数来理解优化问题的几何特性。此外，文章还提出了下降算法的求解框架。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

最近在看"Machine Learning A Probability Perspective"的逻辑回归，因为涉及些优化的算法，一起看书的伙伴们就决定把"Convex Optimization"优化的内容看一下。
在阅读以下内容之前，要记住问题的出发点，即给定无约束的凸问题该用什么方法求得最优解（这里假设问题是求最小值）？各种方法之间有什么性质。
也许最先想到的方法就是对变量求梯度，让它等于 $0$ 的变量值就是最优解。没错，这样的方法很简单，但是当求导过程过于复杂的时候该怎么办呢？也许你还知道梯度下降，牛顿法等等。如果你对这些方法没有听说过，或者还想知道更多，那接着阅读下去吧，当然最好还是建议你看一下原书。
最近正在忙毕设（好慌，QAQ！），看的内容比较少，这次内容包括：

强凸及其性质
下降方法求解的框架
梯度下降算法收敛性分析
最速下降法
牛顿方法

强凸及其性质

函数强凸是说在凸集 $S\rm{S}$ 中，对于 $∀x∈S\forall \mathbf{x}\in \rm S$ 均有下式成立：
$\nabla^2 f(\mathbf{x}) \succcurlyeq m \, \mathbf I$
其中 $m > 0$ ，将上式带入到函数的泰勒展开式
$f(\mathbf y)=f(\mathbf{x})+\nabla f(\mathbf x)^T(\mathbf y-\mathbf x)+\frac{1}{2}(\mathbf y-\mathbf x)^T\nabla^2f(\mathbf x)(\mathbf y-\mathbf x)$
得到
$f(\mathbf y)\geq f(\mathbf x)+\nabla f(\mathbf x)^T(\mathbf y-\mathbf x)+\frac{m}{2}\Vert\mathbf y-\mathbf x\Vert_2^2 \tag 1$
在等式右边对 $y\mathbf y$ 求梯度，得到：
$\mathbf y - \mathbf x=-\frac{1}{m}\nabla f(\mathbf x)$
带入得到
$f(\mathbf y)\geq f(\mathbf x)-\frac{1}{2m}\Vert\nabla f(\mathbf x)\Vert^2_2 \tag{2}$
令 $y=x∗\mathbf y=\mathbf x^*$ ，可以得到 $∇f(x)\nabla f(\mathbf x)$ 的下界
$\Vert \nabla f(\mathbf x)\Vert_2^2\geq 2m(f(\mathbf x)-p^*)$
可以看到，当某一点的梯度越小时，其越接近最优点，而这与最优点梯度为0的常识相符合。因此也可以用 $∥∇f(x)∥22≤ϵ\Vert \nabla f(\mathbf x)\Vert_2^2\leq\epsilon$ 当作收敛条件。

将 $x∗\mathbf x^*$ 带入到(1)中，再利用柯西施瓦茨不等式得到：
$f(\mathbf x^*)\geq f(\mathbf x)-\Vert \nabla f(\mathbf x) \Vert_2\Vert\mathbf x^*-\mathbf x \Vert_2+\frac{m}{2}\Vert\mathbf x^* - \mathbf x\Vert_2^2$
可以得到
$\Vert \mathbf x - \mathbf x^*\Vert_2 \leq \frac{2}{m}\Vert \nabla f(\mathbf x)\Vert_2$
即通过 $∥∇f(x)∥2\Vert \nabla f(\mathbf x)\Vert_2$ 可以得到 $x\mathbf x$ 和 $x∗\mathbf x^*$ 的差距上界。

一般而言 $S\rm S$ 被当做是下子集，是有界的，故存在 $M > 0$ 使得 $∀x∈S\forall \mathbf x\in \rm S$ 满足
$\nabla^2f(\mathbf x)\preceq M\mathbf I$
使用相同的方法，可以得到：
$f(\mathbf y)\preceq f(x)+\nabla f(\mathbf x)^T(\mathbf y-\mathbf x)+\frac{M}{2}\Vert\mathbf y- \mathbf x \Vert_2^2\\ p^* \preceq f(\mathbf x)-\frac{1}{2M}\Vert \nabla f(\mathbf x)\Vert_2^2$
定义 $κ=Mm\kappa = \frac{M}{m}$ 为 $S\rm S$ 的 $condition number\textit condition \, number$ ， $condition number\textit condition \, number$ 会在之后的收敛性分析中扮演重要的角色。 $∇2f(x)\nabla^2 f(\mathbf x)$ 是实对称矩阵，可以看出 $condition number\textit condition\, number$ 可以看作 $∇2f(x)\nabla^2f(\mathbf x)$ 在 $∀x∈S\forall \mathbf x \in \rm S$ 中最大特征值和最小特征值的比值。~~关于条件数和几何解释的衔接还在考虑怎么写~~

下面对 $condition number\textit condition \, number$ 进行几何解释，设凸集合 $C$ 的沿某一方向上的宽度为
$W(C,q)=\sup_{z\in C}q^Tz-\inf_{z\in C}q^Tz$
集合 $C$ 在不同方向上的最大最小宽度如下：
$W_{\min}=\inf_{\vert q\vert_2=1}W(C,q)\\ W_{\max}=\sup_{\vert q\vert_2=1}W(C,q)$
而 $condition numbercondition\, number$ 为 $W_{max}^2/W_{min}^2$ ，从而可以看出 $condition numbercondition\, number$ 越小说明集合 $C$ 越圆，反之则其各项异性比较大。

对于 $α\alpha$ -sublevel子集 $Cα={x∣f(x)≤α}C_{\alpha}=\{\mathbf x\vert f(\mathbf x)\leq\alpha\}$ ，根据(2)和类似的式子可得
$p^*+\frac{M}{2}\Vert\mathbf x^*-\mathbf y \Vert_2^2\geq f(\mathbf y)\geq p^*+\frac{m}{2}\Vert\mathbf x^*-\mathbf y \Vert_2^2$
再利用 $α≥f(y)\alpha \geq f(\mathbf y)$ ，则 $α≥p∗+m2∥x∗−y∥22\alpha \geq p^*+\frac{m}{2}\Vert\mathbf x^*-\mathbf y \Vert_2^2$ 。当 $y\mathbf y$ 趋近于 $x∗\mathbf x^*$ 时， $p∗+M2∥x∗−y∥22p^*+\frac{M}{2}\Vert\mathbf x^*-\mathbf y \Vert_2^2$ 趋近于 $p^*$ ，此时 $α≥p∗+M2∥x∗−y∥22\alpha \geq p^*+\frac{M}{2}\Vert\mathbf x^*-\mathbf y \Vert_2^2$ 可以求得部分 $y\mathbf y$ 的范围。上述两个不等式可以得到：
$B_{inner}=\{\mathbf y\vert \Vert \mathbf y-\mathbf x^* \Vert_2^2\leq \frac{2}{M}(\alpha-p^*)\}\\ B_{outer}=\{\mathbf y\vert \Vert \mathbf y-\mathbf x^* \Vert_2^2\leq \frac{2}{m}(\alpha-p^*)\}$
可见 $α\alpha$ 下子集在 $B_{inner}$ 和 $B_{outer}$ 之间，则根据几何解释 $cond(Cα)≤Mm\textbf{cond}(C_{\alpha})\leq \frac{M}{m}$ 。这样， $\, number$ 和下子集的形状联系了起来。

当 $α\alpha$ 趋近于 $x∗\mathbf x^*$ 时，根据泰勒展开，其梯度可以认为是0，得到下子集 $CαC_\alpha$
$C_\alpha = \{\mathbf y|(\mathbf y - \mathbf x^*)^T\nabla^2f(\mathbf x)(\mathbf y - \mathbf x^*)\leq 2(\alpha - p^*)\}$
可以看到此时的下子集 $CαC_\alpha$ 接近为 $∇2f(x∗)\nabla^2 f(\mathbf x^*)$ 的椭球，此时的 $cond(Cα)=κ∇2f(x∗)\textbf{cond}(C_{\alpha})=\kappa{\nabla^2 f(\mathbf x^*)}$ 。