摘自《数值最优化方法》
\qquad
设
f
(
x
)
f(x)
f(x)具有连续得二阶偏导数,当前得迭代点是
x
k
x_{k}
xk。
f
(
x
)
f(x)
f(x)在
x
k
x_{k}
xk处得
T
a
y
l
o
r
Taylor
Taylor展开式为(以基本
N
e
w
t
o
w
n
Newtown
Newtown法为例(
α
\alpha
α=1))
f
(
x
k
+
1
)
=
f
(
x
k
+
d
)
=
f
(
x
k
)
+
g
k
T
d
+
1
2
d
T
G
k
d
+
O
(
∣
∣
d
∣
∣
2
)
f(x_{k+1})=f(x_{k}+d)=f(x_{k})+g_{k}^{T}d+\frac{1}{2}d^{T}G_{k}d+O(||d||^{2})
f(xk+1)=f(xk+d)=f(xk)+gkTd+21dTGkd+O(∣∣d∣∣2)
\qquad
在点
x
k
x_{k}
xk的领域内,使用二次函数
q
k
(
d
)
=
Δ
f
(
x
k
)
+
g
k
T
d
+
1
2
d
T
G
d
q_{k}(d)\mathop{=}\limits^{\Delta}f(x_{k})+g_{k}^{T}d+\frac{1}{2}d^{T}Gd
qk(d)=Δf(xk)+gkTd+21dTGd近似
f
(
x
k
+
d
)
f(x_{k}+d)
f(xk+d),求解问题
m
i
n
q
k
(
d
)
min\ q_{k}(d)
min qk(d)
\qquad
若
G
k
G_{k}
Gk正定,则方程组
G
k
d
=
−
g
k
G_{k}d=-g_{k}
Gkd=−gk
\qquad
解为
d
k
=
−
G
k
−
1
g
k
d_{k}=-G_{k}^{-1}g_{k}
dk=−Gk−1gk得到的方向为
N
e
w
t
o
w
n
Newtown
Newtown方向。只要
G
k
G_{k}
Gk正定,
N
e
w
t
o
w
n
Newtown
Newtown方向
d
k
d_{k}
dk就是下降方向,即
g
k
T
d
k
=
−
g
k
T
G
−
1
g
k
<
0
g^{T}_{k}d_{k}=-g^{T}_{k}G^{-1}g_{k}<0
gkTdk=−gkTG−1gk<0。
\qquad
基本
N
e
w
t
o
w
n
Newtown
Newtown方法的收敛性
定
义
\color{#F00}{定义}
定义 设
f
(
x
)
∈
C
2
,
f
(
x
)
f(x)\in C^{2},f(x)
f(x)∈C2,f(x)的
H
e
s
s
e
Hesse
Hesse矩阵
G
(
x
)
G(x)
G(x)满足
L
i
s
p
s
c
h
i
t
z
Lispschitz
Lispschitz条件,即存在
β
>
0
\beta>0
β>0,对任给的
x
x
x与
y
y
y,有
∣
∣
G
(
x
)
−
G
(
y
)
∣
∣
≤
β
∣
∣
x
−
y
∣
∣
||G(x)-G(y)||\leq\beta||x-y||
∣∣G(x)−G(y)∣∣≤β∣∣x−y∣∣。若
x
0
x_{0}
x0充分接近
f
(
x
)
f(x)
f(x)的局部极小值点
x
∗
x^{*}
x∗,且
G
∗
G^{*}
G∗正定,则
N
e
w
t
o
w
n
Newtown
Newtown对所有的
k
k
k有定义,并以二阶速度收敛,梯度序列
{
∣
∣
∇
f
k
∣
∣
}
\{||\nabla f_{k}||\}
{∣∣∇fk∣∣}二阶收敛到零。
G k = ∇ 2 f k g k = ∇ f k G_{k} = \nabla^{2}f_{k}\qquad g_{k} = \nabla f_{k} Gk=∇2fkgk=∇fk
\qquad
收敛速度证明 依据基本
N
e
w
t
o
w
n
Newtown
Newtown法定义和最优条件
∇
f
∗
=
0
\nabla f_{*}=0
∇f∗=0得到
x
k
+
d
−
x
∗
=
x
k
−
x
∗
−
∇
2
f
k
−
1
∇
f
k
=
∇
2
f
k
−
1
[
∇
2
f
k
(
x
k
−
x
∗
)
−
(
∇
f
k
−
∇
f
∗
)
]
x_{k}+d-x^{*}=x_{k}-x^{*}-\nabla^{2}f_{k}^{-1}\nabla f_{k}=\nabla^{2}f_{k}^{-1}[\nabla^{2}f_{k}^{}(x_{k}-x^{*})-(\nabla f_{k}-\nabla f_{*})]
xk+d−x∗=xk−x∗−∇2fk−1∇fk=∇2fk−1[∇2fk(xk−x∗)−(∇fk−∇f∗)]因为
∇
f
k
−
∇
f
∗
=
∫
0
1
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
(
x
k
−
x
∗
)
d
t
=
∫
0
1
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
d
(
t
(
x
k
−
x
∗
)
)
=
−
∫
x
k
x
∗
∇
2
f
(
u
)
d
u
\nabla f_{k}-\nabla f_{*}=\int_{0}^{1}\nabla^{2}f(x_{k}+t(x^{*}-x_{k}))(x_{k}-x^{*})dt\\ = \int_{0}^{1}\nabla^{2}f(x_{k}+t(x^{*}-x_{k}))d(t(x_{k}-x^{*}))\\ = -\int_{x_{k}}^{x^{*}}\nabla^{2}f(u)du
∇fk−∇f∗=∫01∇2f(xk+t(x∗−xk))(xk−x∗)dt=∫01∇2f(xk+t(x∗−xk))d(t(xk−x∗))=−∫xkx∗∇2f(u)du又由
∣
∣
∇
2
f
(
x
k
)
(
x
k
−
x
∗
)
−
(
∇
f
k
−
∇
f
(
x
∗
)
)
∣
∣
=
∣
∣
∫
0
1
(
∇
2
f
(
x
k
)
−
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
)
(
x
k
−
x
∗
)
d
t
∣
∣
≤
∫
0
1
∣
∣
(
∇
2
f
(
x
k
)
−
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
)
(
x
k
−
x
∗
)
∣
∣
d
t
≤
∣
∣
x
k
−
x
∗
∣
∣
∫
0
1
∣
∣
(
∇
2
f
(
x
k
)
−
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
∣
∣
d
t
≤
∣
∣
x
k
−
x
∗
∣
∣
2
∫
0
1
L
t
d
t
=
1
2
L
∣
∣
x
k
−
x
∗
∣
∣
2
||\nabla^{2}f(x_{k})(x_{k}-x^{*})-(\nabla f_{k}-\nabla f(x^{*}))||\\ = ||\int_{0}^{1}(\nabla^{2}f(x_{k})-\nabla^{2}f(x_{k}+t(x^{*}-x_{k})))(x_{k}-x^{*})dt||\\ \leq\int_{0}^{1}||(\nabla^{2}f(x_{k})-\nabla^{2}f(x_{k}+t(x^{*}-x_{k})))(x_{k}-x^{*})||dt\\ \leq||x_{k}-x^{*}||\int_{0}^{1}||(\nabla^{2}f(x_{k})-\nabla^{2}f(x_{k}+t(x^{*}-x_{k}))||dt\\ \leq||x_{k}-x^{*}||^{2}\int_{0}^{1}Ltdt=\frac{1}{2}L||x_{k}-x^{*}||^{2}
∣∣∇2f(xk)(xk−x∗)−(∇fk−∇f(x∗))∣∣=∣∣∫01(∇2f(xk)−∇2f(xk+t(x∗−xk)))(xk−x∗)dt∣∣≤∫01∣∣(∇2f(xk)−∇2f(xk+t(x∗−xk)))(xk−x∗)∣∣dt≤∣∣xk−x∗∣∣∫01∣∣(∇2f(xk)−∇2f(xk+t(x∗−xk))∣∣dt≤∣∣xk−x∗∣∣2∫01Ltdt=21L∣∣xk−x∗∣∣2所以
∣
∣
x
k
+
1
−
x
∗
∣
∣
≤
1
2
L
∣
∣
x
k
−
x
∗
∣
∣
2
∣
∣
∇
2
f
k
−
1
∣
∣
||x_{k+1}-x^{*}||\leq\frac{1}{2}L||x_{k}-x^{*}||^{2}||\nabla^{2}f_{k}^{-1}||
∣∣xk+1−x∗∣∣≤21L∣∣xk−x∗∣∣2∣∣∇2fk−1∣∣
\qquad
!
!
!
\color{#F00}{!!!}
!!!当
∇
2
f
(
x
∗
)
\nabla^{2}f(x^{*})
∇2f(x∗)是非奇异并且
∇
2
f
k
→
∇
2
f
(
x
∗
)
\nabla^{2}f_{k}\to\nabla^{2}f(x^{*})
∇2fk→∇2f(x∗)时,有
∣
∣
∇
2
f
k
−
1
∣
∣
≤
2
∣
∣
∇
2
f
(
x
∗
)
−
1
∣
∣
||\nabla^{2}f_{k}^{-1}||\leq2||\nabla^{2}f(x^{*})^{-1}||
∣∣∇2fk−1∣∣≤2∣∣∇2f(x∗)−1∣∣(
有
界
\color{#F00}{有界}
有界),所以当起始点充分接近
x
∗
x^{*}
x∗,序列
x
k
{x_{k}}
xk收敛到
x
∗
x^{*}
x∗则
N
e
w
t
o
w
n
Newtown
Newtown法二阶收敛。
\qquad
由条件
∇
f
k
+
∇
2
f
k
(
x
k
+
1
−
x
k
)
=
0
\nabla f_{k}+\nabla^{2}f_{k}(x_{k+1}-x_{k})=0
∇fk+∇2fk(xk+1−xk)=0(由最优方向选择,考虑
α
≠
1
\alpha\neq1
α̸=1是否有影响)则的:
∣
∣
∇
f
k
+
1
∣
∣
=
∣
∣
∇
f
k
+
1
−
∇
f
k
−
∇
2
f
k
(
x
k
+
1
−
x
k
)
∣
∣
=
∣
∣
∫
0
1
(
∇
2
f
(
x
k
+
t
(
x
k
+
1
−
x
k
)
)
−
∇
2
f
(
x
k
)
)
(
x
k
+
1
−
x
k
)
d
t
∣
∣
≤
1
2
L
∣
∣
x
k
−
1
−
x
k
∣
∣
2
≤
1
2
L
∣
∣
∇
2
f
k
−
1
∣
∣
2
∣
∣
∇
f
k
∣
∣
2
≤
2
L
∣
∣
∇
f
∗
−
1
∣
∣
2
∣
∣
∇
f
k
∣
∣
2
||\nabla f_{k+1}||=||\nabla f_{k+1}-\nabla f_{k}-\nabla^{2}f_{k}(x_{k+1}-x_{k})||\\ = ||\int_{0}^{1}(\nabla^{2}f(x_{k}+t(x_{k+1}-x_{k}))-\nabla^{2}f(x_{k}))(x_{k+1}-x_{k})dt||\\ \leq \frac{1}{2}L ||x_{k-1}-x_{k}||^{2}\\ \leq \frac{1}{2}L||\nabla^{2}f_{k}^{-1}||^{2}||\nabla f_{k}||^{2}\\ \leq 2L||\nabla f_{*}^{-1}||^{2}||\nabla f_{k}||^{2}
∣∣∇fk+1∣∣=∣∣∇fk+1−∇fk−∇2fk(xk+1−xk)∣∣=∣∣∫01(∇2f(xk+t(xk+1−xk))−∇2f(xk))(xk+1−xk)dt∣∣≤21L∣∣xk−1−xk∣∣2≤21L∣∣∇2fk−1∣∣2∣∣∇fk∣∣2≤2L∣∣∇f∗−1∣∣2∣∣∇fk∣∣2以上证明标准梯度
∣
∣
∇
f
k
∣
∣
||\nabla f_{k}||
∣∣∇fk∣∣二阶收敛到零。(
为
什
么
?
?
,
为
什
么
不
证
明
\color{#F00}{为什么??,为什么不证明}
为什么??,为什么不证明
2
L
∣
∣
∇
f
∗
−
1
∣
∣
2
<
1
2L||\nabla f_{*}^{-1}||^{2}<1
2L∣∣∇f∗−1∣∣2<1)。