Briefing on Numerical Optimization_armoji梯度步长-优快云博客

本文链接：https://blog.youkuaiyun.com/yinweijunwhu/article/details/105453538

本文深入探讨了优化领域的关键概念，包括线性搜索方法、牛顿法及其变种、信赖域方法等，强调了精确性和鲁棒性的重要性。解析了梯度下降、牛顿法、拟牛顿法的收敛率，并讨论了阿莫约条件、沃尔夫条件在选择步长中的作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Chapter1, the backgroud of Optimization, robust, precise is the key point.

Chapter2, the brief introduction of Linear Search Method( Steep decent/gradient, Newton use Hessian. rate of convergence is Q-quadratic, Quasi-Newton use approximation to Hessian, rate of convergence is Q-superlinear. in algorithm implement, set step length alpha =1), steep decent method( use negative gradient as the direction) Trust Region Method( to proximate the function in some scopes), the nessary prequists.

basic condition: Armoji condition(give sufficient decrease), Wolf condition( rule out the unnecessary short step length)

The main step of line search method(LS) algorithm is Prediction(set the direction), set step length( in line search method =1), correction(get the $x_{k+1}$ by iterating $x_(k+1)=x_k+\alpha p_k$

Newton Method with Hessian modification: to make this matrix moler, add a scalar matrix. The scalar is absolute of most negative e'val + epsilon.

How to choose step length: consider $\phi(x)$ , use cubic poly to interpolate this function regarding to these point $\phi(0), \phi'(0),\phi(a_0) ,\phi(a_1)$

rate of convergence: type quotient and residual. if we use the limit, then it's super **. If the residual is dominated by a sequence which converges Q-linearly, then it's R-linear.

condition number and uniform bdd condition number. norm of B times norm of B inverse. uniform for the subscript k.

Chapter3, Linear Search Method: of convergence, the key is to find the direction and make step=1, inverstigate the global convergence. Use some modifications to make B positive definite or semidefinite. How to choose the inital steps.

Chapter4, Trust region Method, Cauchy point, Use the quadratical model to proximate the function, set the rule of selecting the radius of the region.

Notation in (TR), direction is denoted by $p_k ^B$ since we not only proximate this function using the quadratic model, but we use the approximation of Hessian of this function by $B_k$ .

key point: use the quadratic model to proximate the origional function. and use linear function to find the cauchy point. The Choice of radius of trust region is based on the previous iteration. consider the

$\rho_k = \frac{f(x_k)-f(x_k + p_k)}{m_k(0)-m_k(p)}$

If the ratio is negative, we will reject this radius. If the ratio is close to 0, we will shrink the radius. If the radius is in an segment such as (1/4, 3/4), nothing is done with the radius. Radius is expanded if the ratio is close to 1.

Cauchy point: use linear model to get the a direction which guaratees the sufficient decrease. Gloabl convergence is ensured if the reduction in the model is positive multiple of the decress attained by the cauchy point. (No matrix factorization is need, then the cost is not expensive)

$p_k^s=arg \ min_{p \in R^n} f_k +g_k^T p$

$\tau_k= arg \ min_{\tau \geq 0} \quad m_k(\tau p_k^s)$

$p_k^c =\tau_k p_k^s$

Model: use the quadratic model and substitute the Hessian matrix with an approximation.

The condition for global solution of this trust-region problem with respect to the quadratic model.

Condition. A vector(direction) $p*$ is feasible and satisfing the following condition could be a gloabl solution.

$(B+\lambda I)p*=-g \quad (decent) \par \lambda(\Delta - ||p*||) =0 \quad (\text{B is nearly positive definite or length of direction is large enough} ) \par (B+\lambda I) \text{is positive definite}$

For the quadratic model, we have 3 strategies to solve this. First is dogleg method, second method is to involve the 2-dim subspace minimization. The last measure is to involve the conjuate gradient method. (CG)

Chapter5, undone=||=