Newtown(牛顿)方法收敛速度

摘自《数值最优化方法》
\qquad f ( x ) f(x) f(x)具有连续得二阶偏导数,当前得迭代点是 x k x_{k} xk f ( x ) f(x) f(x) x k x_{k} xk处得 T a y l o r Taylor Taylor展开式为(以基本 N e w t o w n Newtown Newtown法为例( α \alpha α=1))
f ( x k + 1 ) = f ( x k + d ) = f ( x k ) + g k T d + 1 2 d T G k d + O ( ∣ ∣ d ∣ ∣ 2 ) f(x_{k+1})=f(x_{k}+d)=f(x_{k})+g_{k}^{T}d+\frac{1}{2}d^{T}G_{k}d+O(||d||^{2}) f(xk+1)=f(xk+d)=f(xk)+gkTd+21dTGkd+O(d2) \qquad 在点 x k x_{k} xk的领域内,使用二次函数
q k ( d ) = Δ f ( x k ) + g k T d + 1 2 d T G d q_{k}(d)\mathop{=}\limits^{\Delta}f(x_{k})+g_{k}^{T}d+\frac{1}{2}d^{T}Gd qk(d)=Δf(xk)+gkTd+21dTGd近似 f ( x k + d ) f(x_{k}+d) f(xk+d),求解问题
m i n   q k ( d ) min\ q_{k}(d) min qk(d) \qquad G k G_{k} Gk正定,则方程组
G k d = − g k G_{k}d=-g_{k} Gkd=gk \qquad 解为 d k = − G k − 1 g k d_{k}=-G_{k}^{-1}g_{k} dk=Gk1gk得到的方向为 N e w t o w n Newtown Newtown方向。只要 G k G_{k} Gk正定, N e w t o w n Newtown Newtown方向 d k d_{k} dk就是下降方向,即 g k T d k = − g k T G − 1 g k &lt; 0 g^{T}_{k}d_{k}=-g^{T}_{k}G^{-1}g_{k}&lt;0 gkTdk=gkTG1gk<0
\qquad 基本 N e w t o w n Newtown Newtown方法的收敛性 定 义 \color{#F00}{定义} f ( x ) ∈ C 2 , f ( x ) f(x)\in C^{2},f(x) f(x)C2,f(x) H e s s e Hesse Hesse矩阵 G ( x ) G(x) G(x)满足 L i s p s c h i t z Lispschitz Lispschitz条件,即存在 β &gt; 0 \beta&gt;0 β>0,对任给的 x x x y y y,有 ∣ ∣ G ( x ) − G ( y ) ∣ ∣ ≤ β ∣ ∣ x − y ∣ ∣ ||G(x)-G(y)||\leq\beta||x-y|| G(x)G(y)βxy。若 x 0 x_{0} x0充分接近 f ( x ) f(x) f(x)的局部极小值点 x ∗ x^{*} x,且 G ∗ G^{*} G正定,则 N e w t o w n Newtown Newtown对所有的 k k k有定义,并以二阶速度收敛,梯度序列 { ∣ ∣ ∇ f k ∣ ∣ } \{||\nabla f_{k}||\} {fk}二阶收敛到零。


G k = ∇ 2 f k g k = ∇ f k G_{k} = \nabla^{2}f_{k}\qquad g_{k} = \nabla f_{k} Gk=2fkgk=fk


\qquad 收敛速度证明 依据基本 N e w t o w n Newtown Newtown法定义和最优条件 ∇ f ∗ = 0 \nabla f_{*}=0 f=0得到
x k + d − x ∗ = x k − x ∗ − ∇ 2 f k − 1 ∇ f k = ∇ 2 f k − 1 [ ∇ 2 f k ( x k − x ∗ ) − ( ∇ f k − ∇ f ∗ ) ] x_{k}+d-x^{*}=x_{k}-x^{*}-\nabla^{2}f_{k}^{-1}\nabla f_{k}=\nabla^{2}f_{k}^{-1}[\nabla^{2}f_{k}^{}(x_{k}-x^{*})-(\nabla f_{k}-\nabla f_{*})] xk+dx=xkx2fk1fk=2fk1[2fk(xkx)(fkf)]因为
∇ f k − ∇ f ∗ = ∫ 0 1 ∇ 2 f ( x k + t ( x ∗ − x k ) ) ( x k − x ∗ ) d t = ∫ 0 1 ∇ 2 f ( x k + t ( x ∗ − x k ) ) d ( t ( x k − x ∗ ) ) = − ∫ x k x ∗ ∇ 2 f ( u ) d u \nabla f_{k}-\nabla f_{*}=\int_{0}^{1}\nabla^{2}f(x_{k}+t(x^{*}-x_{k}))(x_{k}-x^{*})dt\\ = \int_{0}^{1}\nabla^{2}f(x_{k}+t(x^{*}-x_{k}))d(t(x_{k}-x^{*}))\\ = -\int_{x_{k}}^{x^{*}}\nabla^{2}f(u)du fkf=012f(xk+t(xxk))(xkx)dt=012f(xk+t(xxk))d(t(xkx))=xkx2f(u)du又由
∣ ∣ ∇ 2 f ( x k ) ( x k − x ∗ ) − ( ∇ f k − ∇ f ( x ∗ ) ) ∣ ∣ = ∣ ∣ ∫ 0 1 ( ∇ 2 f ( x k ) − ∇ 2 f ( x k + t ( x ∗ − x k ) ) ) ( x k − x ∗ ) d t ∣ ∣ ≤ ∫ 0 1 ∣ ∣ ( ∇ 2 f ( x k ) − ∇ 2 f ( x k + t ( x ∗ − x k ) ) ) ( x k − x ∗ ) ∣ ∣ d t ≤ ∣ ∣ x k − x ∗ ∣ ∣ ∫ 0 1 ∣ ∣ ( ∇ 2 f ( x k ) − ∇ 2 f ( x k + t ( x ∗ − x k ) ) ∣ ∣ d t ≤ ∣ ∣ x k − x ∗ ∣ ∣ 2 ∫ 0 1 L t d t = 1 2 L ∣ ∣ x k − x ∗ ∣ ∣ 2 ||\nabla^{2}f(x_{k})(x_{k}-x^{*})-(\nabla f_{k}-\nabla f(x^{*}))||\\ = ||\int_{0}^{1}(\nabla^{2}f(x_{k})-\nabla^{2}f(x_{k}+t(x^{*}-x_{k})))(x_{k}-x^{*})dt||\\ \leq\int_{0}^{1}||(\nabla^{2}f(x_{k})-\nabla^{2}f(x_{k}+t(x^{*}-x_{k})))(x_{k}-x^{*})||dt\\ \leq||x_{k}-x^{*}||\int_{0}^{1}||(\nabla^{2}f(x_{k})-\nabla^{2}f(x_{k}+t(x^{*}-x_{k}))||dt\\ \leq||x_{k}-x^{*}||^{2}\int_{0}^{1}Ltdt=\frac{1}{2}L||x_{k}-x^{*}||^{2} 2f(xk)(xkx)(fkf(x))=01(2f(xk)2f(xk+t(xxk)))(xkx)dt01(2f(xk)2f(xk+t(xxk)))(xkx)dtxkx01(2f(xk)2f(xk+t(xxk))dtxkx201Ltdt=21Lxkx2所以
∣ ∣ x k + 1 − x ∗ ∣ ∣ ≤ 1 2 L ∣ ∣ x k − x ∗ ∣ ∣ 2 ∣ ∣ ∇ 2 f k − 1 ∣ ∣ ||x_{k+1}-x^{*}||\leq\frac{1}{2}L||x_{k}-x^{*}||^{2}||\nabla^{2}f_{k}^{-1}|| xk+1x21Lxkx22fk1 \qquad ! ! ! \color{#F00}{!!!} ∇ 2 f ( x ∗ ) \nabla^{2}f(x^{*}) 2f(x)是非奇异并且 ∇ 2 f k → ∇ 2 f ( x ∗ ) \nabla^{2}f_{k}\to\nabla^{2}f(x^{*}) 2fk2f(x)时,有 ∣ ∣ ∇ 2 f k − 1 ∣ ∣ ≤ 2 ∣ ∣ ∇ 2 f ( x ∗ ) − 1 ∣ ∣ ||\nabla^{2}f_{k}^{-1}||\leq2||\nabla^{2}f(x^{*})^{-1}|| 2fk122f(x)1( 有 界 \color{#F00}{有界} ),所以当起始点充分接近 x ∗ x^{*} x,序列 x k {x_{k}} xk收敛到 x ∗ x^{*} x N e w t o w n Newtown Newtown法二阶收敛。
\qquad 由条件 ∇ f k + ∇ 2 f k ( x k + 1 − x k ) = 0 \nabla f_{k}+\nabla^{2}f_{k}(x_{k+1}-x_{k})=0 fk+2fk(xk+1xk)=0(由最优方向选择,考虑 α ≠ 1 \alpha\neq1 α̸=1是否有影响)则的:
∣ ∣ ∇ f k + 1 ∣ ∣ = ∣ ∣ ∇ f k + 1 − ∇ f k − ∇ 2 f k ( x k + 1 − x k ) ∣ ∣ = ∣ ∣ ∫ 0 1 ( ∇ 2 f ( x k + t ( x k + 1 − x k ) ) − ∇ 2 f ( x k ) ) ( x k + 1 − x k ) d t ∣ ∣ ≤ 1 2 L ∣ ∣ x k − 1 − x k ∣ ∣ 2 ≤ 1 2 L ∣ ∣ ∇ 2 f k − 1 ∣ ∣ 2 ∣ ∣ ∇ f k ∣ ∣ 2 ≤ 2 L ∣ ∣ ∇ f ∗ − 1 ∣ ∣ 2 ∣ ∣ ∇ f k ∣ ∣ 2 ||\nabla f_{k+1}||=||\nabla f_{k+1}-\nabla f_{k}-\nabla^{2}f_{k}(x_{k+1}-x_{k})||\\ = ||\int_{0}^{1}(\nabla^{2}f(x_{k}+t(x_{k+1}-x_{k}))-\nabla^{2}f(x_{k}))(x_{k+1}-x_{k})dt||\\ \leq \frac{1}{2}L ||x_{k-1}-x_{k}||^{2}\\ \leq \frac{1}{2}L||\nabla^{2}f_{k}^{-1}||^{2}||\nabla f_{k}||^{2}\\ \leq 2L||\nabla f_{*}^{-1}||^{2}||\nabla f_{k}||^{2} fk+1=fk+1fk2fk(xk+1xk)=01(2f(xk+t(xk+1xk))2f(xk))(xk+1xk)dt21Lxk1xk221L2fk12fk22Lf12fk2以上证明标准梯度 ∣ ∣ ∇ f k ∣ ∣ ||\nabla f_{k}|| fk二阶收敛到零。( 为 什 么 ? ? , 为 什 么 不 证 明 \color{#F00}{为什么??,为什么不证明} 2 L ∣ ∣ ∇ f ∗ − 1 ∣ ∣ 2 &lt; 1 2L||\nabla f_{*}^{-1}||^{2}&lt;1 2Lf12<1)。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值