共轭向量及其性质
- A是 n × n n \times n n×n的对称正定矩阵,对于方向 d ( 0 ) , d ( 1 ) , d ( 2 ) . . . d ( m ) d^{(0)},d^{(1)},d^{(2)}...d^{(m)} d(0),d(1),d(2)...d(m),如果对于所有的 i ≠ j i \neq j i=j,有 [ d ( i ) ] T A d ( j ) = 0 \boldsymbol{[d^{(i)}]^TAd^{(j)} = 0} [d(i)]TAd(j)=0,则称它们是关于A是共轭的
- 假设A是
n
×
n
n \times n
n×n的对称正定矩阵,向量组
d
(
0
)
,
d
(
1
)
,
d
(
2
)
.
.
.
d
(
m
)
d^{(0)},d^{(1)},d^{(2)}...d^{(m)}
d(0),d(1),d(2)...d(m)关于矩阵A共轭,则该向量组线性无关
证明: 如果 d ( 0 ) , d ( 1 ) , d ( 2 ) . . . d ( m ) 线性相关 , 则有 α 1 d ( 0 ) + α 2 d ( 1 ) + α 3 d ( 2 ) + . . . + α m d ( m ) = 0 两边同乘 [ d ( j ) ] T A , 得 α j [ d ( j ) ] T d ( j ) = 0 , 由于 d ( j ) ≠ 0 , 所以 α j = 0 因此 d ( 0 ) , d ( 1 ) , d ( 2 ) . . . d ( m ) 线性无关 \begin{aligned} 证明:&如果d^{(0)},d^{(1)},d^{(2)}...d^{(m)}线性相关,\\ &则有\alpha_1 d^{(0)}+\alpha_2d^{(1)}+\alpha_3d^{(2)}+...+\alpha_md^{(m)} = 0 \\ &两边同乘[d^{(j)}]^TA,得\alpha_j[d^{(j)}]^Td^{(j)}=0,由于d^{(j)}\neq0,所以\alpha_j=0\\ &因此\ d^{(0)},d^{(1)},d^{(2)}...d^{(m)}线性无关 \end{aligned} 证明:如果d(0),d(1),d(2)...d(m)线性相关,则有α1d(0)+α2d(1)+α3d(2)+...+αmd(m)=0两边同乘[d(j)]TA,得αj[d(j)]Td(j)=0,由于d(j)=0,所以αj=0因此 d(0),d(1),d(2)...d(m)线性无关 - 假设A是 n × n n \times n n×n的对称正定矩阵,关于矩阵A共轭的向量至多有个(A最多有n个基,所以最多只有n个线性无关的向量)
共轭方向法的基本原理
假设优化函数为 m i n f ( x ) = 1 2 x T H x + x T b + a f 在 x ( k ) 出的梯度向量为 g ( x ( k ) ) = H x ( k ) + b ( 梯度就是广义一阶导 ) f 在 x ( k ) 出的 H e s s e 矩阵为 h ( x k ) = H \begin{aligned} &假设优化函数为 min f(x)=\frac{1}{2}x^THx+x^Tb+a\\ &f在\ x^{(k)}\ 出的梯度向量为\ g(x^{(k)}) = Hx^{(k)}+b(梯度就是广义一阶导)\\ &f在\ x^{(k)}\ 出的Hesse矩阵为 h(x^k) = H \end{aligned} 假设优化函数为minf(x)=21xTHx+xTb+af在 x(k) 出的梯度向量为 g(x(k))=Hx(k)+b(梯度就是广义一阶导)f在 x(k) 出的Hesse矩阵为h(xk)=H
如果向量 d ( 0 ) , d ( 1 ) , d ( 2 ) . . . d ( n − 1 ) d^{(0)},d^{(1)},d^{(2)}...d^{(n-1)} d(0),d(1),d(2)...d(n−1)关于H共轭,则这些向量线性无关,进而它们可以构成n维空间的一组基
x ∗ x^* x∗为最后找大的最优值, x ∗ − x ( 0 ) x^*-x^{(0)} x∗−x(0)任在n维空间内
因此
,
x
∗
−
x
(
0
)
=
∑
i
=
0
n
−
1
α
(
i
)
d
(
i
)
⟹
x
∗
=
x
(
0
)
+
∑
i
=
0
n
−
1
α
(
i
)
d
(
i
)
因此, x^*-x^{(0)} = \sum_{i=0}^{n-1} \alpha^{(i)} d^{(i)}\ \Longrightarrow \ x^* =x^{(0)}+ \sum_{i=0}^{n-1} \alpha^{(i)} d^{(i)}
因此,x∗−x(0)=∑i=0n−1α(i)d(i) ⟹ x∗=x(0)+∑i=0n−1α(i)d(i)
给定一个初始值
x
(
0
)
x^{(0)}
x(0), 如果对于共轭向量
d
(
0
)
,
d
(
1
)
,
d
(
2
)
.
.
.
d
(
n
−
1
)
d^{(0)},d^{(1)},d^{(2)}...d^{(n-1)}
d(0),d(1),d(2)...d(n−1), 如果知道了每一步的移动步长
α
(
i
)
\alpha^{(i)}
α(i), 便可以获取最优值
x
x
x
x
∗
−
x
(
0
)
=
∑
i
=
0
n
−
1
α
(
i
)
d
(
i
)
[
d
(
k
)
]
T
H
[
x
∗
−
x
(
0
)
]
=
∑
i
=
0
n
−
1
[
d
(
k
)
]
T
H
α
(
i
)
d
(
i
)
=
α
(
k
)
[
d
(
k
)
]
T
H
d
(
k
)
α
(
k
)
=
[
d
(
k
)
]
T
H
[
x
∗
−
x
(
0
)
]
[
d
(
k
)
]
T
H
d
(
k
)
x^*-x^{(0)} = \sum_{i=0}^{n-1} \alpha^{(i)} d^{(i)} \\ [d^{(k)}]^TH[x^*-x^{(0)}] = \sum_{i=0}^{n-1} [d^{(k)}]^TH\alpha^{(i)}d^{(i)}=\alpha^{(k)}[d^{(k)}]^THd^{(k)} \\ \alpha^{(k)}=\frac{[d^{(k)}]^TH[x^*-x^{(0)}]}{[d^{(k)}]^THd^{(k)}}
x∗−x(0)=i=0∑n−1α(i)d(i)[d(k)]TH[x∗−x(0)]=i=0∑n−1[d(k)]THα(i)d(i)=α(k)[d(k)]THd(k)α(k)=[d(k)]THd(k)[d(k)]TH[x∗−x(0)]
x
∗
x^*
x∗是需要求解的最小值,因此需要引入通用公式
x
(
k
)
=
x
(
0
)
+
∑
i
=
0
k
−
1
α
(
i
)
d
(
i
)
x^{(k)}=x{(0)}+\sum_{i=0}^{k-1}\alpha^{(i)}d^{(i)}
x(k)=x(0)+∑i=0k−1α(i)d(i) 去掉
x
∗
x^*
x∗
x
(
k
)
=
x
(
0
)
+
∑
i
=
0
k
−
1
α
(
i
)
d
(
i
)
[
d
(
k
)
]
T
H
[
x
k
−
x
(
0
)
]
=
∑
i
=
0
k
−
1
[
d
(
k
)
]
T
H
α
(
i
)
d
(
i
)
=
0
[
d
(
k
)
]
T
H
[
x
k
]
=
[
d
(
k
)
]
T
H
[
x
(
0
)
]
α
(
k
)
=
[
d
(
k
)
]
T
H
[
x
∗
−
x
(
0
)
]
[
d
(
k
)
]
T
H
d
(
k
)
⟹
α
(
k
)
=
[
d
(
k
)
]
T
H
[
x
∗
−
x
(
k
)
]
[
d
(
k
)
]
T
H
d
(
k
)
x^{(k)}=x{(0)}+\sum_{i=0}^{k-1}\alpha^{(i)}d^{(i)} \\ [d^{(k)}]^TH[x^k-x^{(0)}] = \sum_{i=0}^{k-1} [d^{(k)}]^TH\alpha^{(i)}d^{(i)}=0 \\ [d^{(k)}]^TH[x^k] = [d^{(k)}]^TH[x^{(0)}]\\ \\ \alpha^{(k)}=\frac{[d^{(k)}]^TH[x^*-x^{(0)}]}{[d^{(k)}]^THd^{(k)}} \Longrightarrow \alpha^{(k)}=\frac{[d^{(k)}]^TH[x^*-x^{(\color{blue}k)}]}{[d^{(k)}]^THd^{(k)}}
x(k)=x(0)+i=0∑k−1α(i)d(i)[d(k)]TH[xk−x(0)]=i=0∑k−1[d(k)]THα(i)d(i)=0[d(k)]TH[xk]=[d(k)]TH[x(0)]α(k)=[d(k)]THd(k)[d(k)]TH[x∗−x(0)]⟹α(k)=[d(k)]THd(k)[d(k)]TH[x∗−x(k)]
f
在
x
(
k
)
出的梯度向量为
g
(
x
(
k
)
)
=
H
x
(
k
)
+
b
\color{red} f在\ x^{(k)}\ 出的梯度向量为\ g(x^{(k)}) = Hx^{(k)}+b
f在 x(k) 出的梯度向量为 g(x(k))=Hx(k)+b
在
x
(
k
)
出的
H
e
s
s
e
矩阵为
h
(
x
k
)
=
H
\color{red}在\ x^{(k)}\ 出的Hesse矩阵为 h(x^k) = H
在 x(k) 出的Hesse矩阵为h(xk)=H
H
[
x
∗
−
x
(
k
)
]
=
[
g
(
x
∗
)
+
b
]
−
[
g
(
x
k
)
+
b
]
=
g
(
x
∗
)
−
g
(
x
k
)
由于极小值处的梯度为
0
,所以
g
(
x
∗
)
=
0
H
[
x
∗
−
x
(
k
)
]
=
−
g
(
x
k
)
α
(
k
)
=
[
d
(
k
)
]
T
H
[
x
∗
−
x
(
k
)
]
[
d
(
k
)
]
T
H
d
(
k
)
⟹
α
(
k
)
=
−
[
d
(
k
)
]
T
g
(
x
(
k
)
)
[
d
(
k
)
]
T
H
d
(
k
)
H[x^*-x^{(k)}] = [g(x^*) + b] - [g(x^k) + b] = g(x^*) - g(x^k) \\ \color{red} 由于极小值处的梯度为0,所以 g(x^*) = 0 \color{auto}\\ H[x^*-x^{(k)}]=-g(x^k)\\ \alpha^{(k)}=\frac{[d^{(k)}]^TH[x^*-x^{(k)}]}{[d^{(k)}]^THd^{(k)}} \Longrightarrow \alpha^{(k)}=-\frac{[d^{(k)}]^Tg(x^{(k)})}{[d^{(k)}]^THd^{(k)}}
H[x∗−x(k)]=[g(x∗)+b]−[g(xk)+b]=g(x∗)−g(xk)由于极小值处的梯度为0,所以g(x∗)=0H[x∗−x(k)]=−g(xk)α(k)=[d(k)]THd(k)[d(k)]TH[x∗−x(k)]⟹α(k)=−[d(k)]THd(k)[d(k)]Tg(x(k))
扩张子空间定理
对所有的
k
有
g
(
k
+
1
)
d
(
k
)
=
0
⟹
f
(
x
(
k
+
1
)
)
=
m
i
n
α
f
(
x
(
k
)
+
α
d
(
k
)
)
对所有的
k
,
有
g
(
k
+
1
)
d
(
i
)
=
0
(
0
≤
k
≤
n
−
1
,
0
≤
i
≤
k
)
⟹
f
(
x
(
k
+
1
)
)
=
m
i
n
α
(
0
)
,
.
.
.
,
α
(
k
)
f
(
x
(
0
)
+
∑
i
=
0
k
α
(
i
)
d
(
k
)
)
对所有的\ k \ 有\ g^{(k+1)}d^{(k)} = 0 \Longrightarrow f(x^{(k+1)})=\underset{\alpha}{min}f(x^{(k)}+\alpha d^{(k)}) \\ 对所有的\ k, \ 有\ g^{(k+1)}d^{(i)} = 0 \ ( 0\le k \le n-1 , 0 \le i \le k)\Longrightarrow f(x^{(k+1)})=\underset{\alpha^{(0)},...,\alpha^{(k)}}{min}f(x^{(0)}+\sum_{i=0}^{k} \alpha^{(i)} d^{(k)})
对所有的 k 有 g(k+1)d(k)=0⟹f(x(k+1))=αminf(x(k)+αd(k))对所有的 k, 有 g(k+1)d(i)=0 (0≤k≤n−1,0≤i≤k)⟹f(x(k+1))=α(0),...,α(k)minf(x(0)+i=0∑kα(i)d(k))
也就是说,每一步的更新都会找到当前方向上的最小值
f
(
x
(
k
+
1
)
)
=
m
i
n
α
(
0
)
,
.
.
.
,
α
(
k
)
f
(
x
(
0
)
+
∑
i
=
0
k
α
(
i
)
d
(
k
)
)
记
V
k
=
x
(
0
)
+
s
p
a
n
[
d
(
0
)
,
d
(
1
)
,
.
.
.
,
d
(
k
)
]
f(x^{(k+1)})=\underset{\alpha^{(0)},...,\alpha^{(k)}}{min}f(x^{(0)}+\sum_{i=0}^{k} \alpha^{(i)} d^{(k)})\\ 记\ V_k = x^{(0)}+span[d^{(0)},d^{(1)},...,d^{(k)}] \\
f(x(k+1))=α(0),...,α(k)minf(x(0)+i=0∑kα(i)d(k))记 Vk=x(0)+span[d(0),d(1),...,d(k)]
f
(
k
(
k
+
1
)
)
=
m
i
n
x
∈
V
k
f
(
x
)
f(k^{(k+1)}) = \underset{x \in V_k}{min}f(x)
f(k(k+1))=x∈Vkminf(x) 随着
k
k
k的增大,子空间
s
p
a
n
[
d
(
0
)
,
d
(
1
)
,
.
.
.
,
d
(
k
)
]
span[d^{(0)},d^{(1)},...,d^{(k)}]
span[d(0),d(1),...,d(k)] 不断“扩张”,直到充满整个
R
n
R^n
Rn。 当
k
k
k 足够大时,
x
∗
x^*
x∗ 将位于
V
k
V_k
Vk 中,这便是扩张子空间定理
共轭梯度法
m i n f ( x ) = 1 2 X T H x + x T b + a min\ f(x)=\frac{1}{2}X^THx+x^Tb+a min f(x)=21XTHx+xTb+a
-
令 k = 0 k=0 k=0,选择初始值 x ( 0 ) x^{(0)} x(0)
-
计算 g ( 0 ) = ∇ f ( x ( 0 ) ) g^{(0)}=\nabla f(x^{(0)}) g(0)=∇f(x(0)), 如果 g ( 0 ) = 0 g^{(0)}=0 g(0)=0,停止迭代;否则令 d ( 0 ) = − g ( 0 ) d^{(0)}=-g^{(0)} d(0)=−g(0)
-
计算 α ( k ) = − [ d ( k ) ] T g ( x ( k ) ) [ d ( k ) ] T H d ( k ) ) \alpha^{(k)}=-\frac{[d^{(k)}]^Tg(x^{(k)})}{[d^{(k)}]^THd^{(k)})} α(k)=−[d(k)]THd(k))[d(k)]Tg(x(k))
-
计算 x ( k + 1 ) = x ( k ) + α ( k ) d ( k ) x^{(k+1)}=x^{(k)}+\alpha^{(k)}d^{(k)} x(k+1)=x(k)+α(k)d(k)
-
计算 g ( k + 1 ) g^{(k+1)} g(k+1)= ∇ f ( x ( k + 1 ) ) \nabla f(x^{(k+1)}) ∇f(x(k+1)),如果 g ( k + 1 ) = 0 g^{(k+1)}=0 g(k+1)=0, 停止迭代
-
计算 β ( k ) = [ g ( x ( k + 1 ) ) ] T H d ( k ) [ d ( k ) ] T H d ( k ) \beta^{(k)}=\frac{[g(x^{(k+1)})]^THd^{(k)}}{[d^{(k)}]^THd^{(k)}} β(k)=[d(k)]THd(k)[g(x(k+1))]THd(k) (通过 β \beta β 求取共轭方向)
-
计算 d ( k + 1 ) = − g ( k + 1 ) + β ( k ) d ( k ) d^{(k+1)}=-g^{(k+1)}+\beta^{(k)}d^{(k)} d(k+1)=−g(k+1)+β(k)d(k)
-
令 k = k + 1 k=k+1 k=k+1,回到第3步
非二次型问题中的共轭梯度法
在非二次型问题中Hess矩阵会不断变化,如何去除H?
-
令 k = 0 k=0 k=0,选择初始值 x ( 0 ) x^{(0)} x(0)
-
计算 g ( 0 ) = ∇ f ( x ( 0 ) ) g^{(0)}=\nabla f(x^{(0)}) g(0)=∇f(x(0)), 如果 g ( 0 ) = 0 g^{(0)}=0 g(0)=0,停止迭代;否则令 d ( 0 ) = − g ( 0 ) d^{(0)}=-g^{(0)} d(0)=−g(0)
-
计算 α ( k ) = − [ d ( k ) ] T g ( x ( k ) ) [ d ( k ) ] T H d ( k ) ) ⟹ f ( x ( k + 1 ) ) = m i n α f ( x ( k ) + α d ( k ) ) ⟹ α ( k ) 可以通过一维精度搜索获取 \alpha^{(k)}=-\frac{[d^{(k)}]^Tg(x^{(k)})}{[d^{(k)}]^THd^{(k)})} \color{red} \Longrightarrow \ f(x^{(k+1)})=\ \underset{\alpha}{min}\ f(x^{(k)}+\alpha d^{(k)}) \Longrightarrow \alpha^{(k)}可以通过一维精度搜索获取 α(k)=−[d(k)]THd(k))[d(k)]Tg(x(k))⟹ f(x(k+1))= αmin f(x(k)+αd(k))⟹α(k)可以通过一维精度搜索获取
-
计算 x ( k + 1 ) = x ( k ) + α ( k ) d ( k ) x^{(k+1)}=x^{(k)}+\alpha^{(k)}d^{(k)} x(k+1)=x(k)+α(k)d(k)
-
计算 g ( k + 1 ) g^{(k+1)} g(k+1)= ∇ f ( x ( k + 1 ) ) \nabla f(x^{(k+1)}) ∇f(x(k+1)),如果 g ( k + 1 ) = 0 g^{(k+1)}=0 g(k+1)=0, 停止迭代
-
计算 β ( k ) = [ g ( k + 1 ) ] T [ g ( k + 1 ) ] [ g ( k ) ] T g ( k ) \beta^{(k)}=\color{red} \frac{[g^{(k+1)}]^T[g^{(k+1)}]}{[g^{(k)}]^Tg^{(k)}} β(k)=[g(k)]Tg(k)[g(k+1)]T[g(k+1)]
β ( k ) = [ g ( x ( k + 1 ) ) ] T H d ( k ) [ d ( k ) ] T H d ( k ) x ( k + 1 ) = x ( k ) + α ( k ) d ( k ) ⟹ H x ( k + 1 ) − b = H ( x ( k ) + α ( k ) d ( k ) ) − b g ( k + 1 ) = g ( k ) + α ( k ) H d ( k ) ⟹ H d ( k ) = g ( k + 1 ) − g ( k ) α ( k ) ⟹ β ( k ) = [ g ( x ( k + 1 ) ) ] T [ g ( k + 1 ) − g ( k ) ] [ d ( k ) ] T [ g ( k + 1 ) − g ( k ) ] d ( k ) = − g ( k ) + β ( k − 1 ) d ( k − 1 ) ⟹ [ g ( k ) ] T d ( k ) = − [ g ( k ) ] T g ( k ) + β ( k − 1 ) [ g ( k ) ] T d ( k − 1 ) [ g ( k ) ] T d ( k ) = − [ g ( k ) ] T g ( k ) ( [ g ( k ) ] T d ( k − 1 ) = 0 ) ⟹ β ( k ) = [ g ( x ( k + 1 ) ) ] T [ g ( k + 1 ) − g ( k ) ] [ g ( k ) ] T g ( k ) d ( k ) = − g ( k ) + β ( k − 1 ) d ( k − 1 ) ⟹ [ g ( k + 1 ) ] T d ( k ) = − [ g ( k + 1 ) ] T g ( k ) + β ( k − 1 ) [ g ( k + 1 ) ] T d ( k − 1 ) − [ g ( k + 1 ) ] T g ( k ) = 0 ⟹ β ( k ) = [ g ( k + 1 ) ] T [ g ( k + 1 ) ] [ g ( k ) ] T g ( k ) \begin{aligned} &\beta^{(k)}=\frac{[g(x^{(k+1)})]^THd^{(k)}}{[d^{(k)}]^THd^{(k)}} \\ &x^{(k+1)} = x^{(k)}+\alpha^{(k)}d^{(k)} \Longrightarrow Hx^{(k+1)}-b = H(x^{(k)}+\alpha^{(k)}d^{(k)})-b \\ &g^{(k+1)}=g^{(k)}+\alpha^{(k)}Hd^{(k)} \Longrightarrow Hd^{(k)}=\frac{g^{(k+1)}-g^{(k)}}{\alpha^{(k)}} \Longrightarrow \color{red} \beta^{(k)} =\frac{[g(x^{(k+1)})]^T[g^{(k+1)}-g^{(k)}]}{[d^{(k)}]^T[g^{(k+1)}-g^{(k)}]} \\ &\color{auto} d^{(k)} = -g^{(k)}+\beta^{(k-1)}d^{(k-1)}\Longrightarrow [g^{(k)}]^Td^{(k)} = -[g^{(k)}]^Tg^{(k)}+\beta^{(k-1)}[g^{(k)}]^Td^{(k-1)} \\ &[g^{(k)}]^Td^{(k)} = -[g^{(k)}]^Tg^{(k)}\ (\color{blue}[g^{(k)}]^Td^{(k-1)}=0\color{black}) \Longrightarrow \color{red} \beta^{(k)} =\frac{[g(x^{(k+1)})]^T[g^{(k+1)}-g^{(k)}]}{[g^{(k)}]^Tg^{(k)}} \color{auto} \\ &d^{(k)} = -g^{(k)}+\beta^{(k-1)}d^{(k-1)}\Longrightarrow [g^{(k+1)}]^Td^{(k)} = -[g^{(k+1)}]^Tg^{(k)}+\beta^{(k-1)}[g^{(k+1)}]^Td^{(k-1)}\\ &-[g^{(k+1)}]^Tg^{(k)} = 0 \Longrightarrow \color{red} \beta^{(k)} =\frac{[g^{(k+1)}]^T[g^{(k+1)}]}{[g^{(k)}]^Tg^{(k)}}\\ \end{aligned} β(k)=[d(k)]THd(k)[g(x(k+1))]THd(k)x(k+1)=x(k)+α(k)d(k)⟹Hx(k+1)−b=H(x(k)+α(k)d(k))−bg(k+1)=g(k)+α(k)Hd(k)⟹Hd(k)=α(k)g(k+1)−g(k)⟹β(k)=[d(k)]T[g(k+1)−g(k)][g(x(k+1))]T[g(k+1)−g(k)]d(k)=−g(k)+β(k−1)d(k−1)⟹[g(k)]Td(k)=−[g(k)]Tg(k)+β(k−1)[g(k)]Td(k−1)[g(k)]Td(k)=−[g(k)]Tg(k) ([g(k)]Td(k−1)=0)⟹β(k)=[g(k)]Tg(k)[g(x(k+1))]T[g(k+1)−g(k)]d(k)=−g(k)+β(k−1)d(k−1)⟹[g(k+1)]Td(k)=−[g(k+1)]Tg(k)+β(k−1)[g(k+1)]Td(k−1)−[g(k+1)]Tg(k)=0⟹β(k)=[g(k)]Tg(k)[g(k+1)]T[g(k+1)] -
计算 d ( k + 1 ) = − g ( k + 1 ) + β ( k ) d ( k ) d^{(k+1)}=-g^{(k+1)}+\beta^{(k)}d^{(k)} d(k+1)=−g(k+1)+β(k)d(k)
-
令 k = k + 1 k=k+1 k=k+1,回到第3步
通常来说,共轭梯度法的收敛速度比最速下降法快,而且不像牛顿法那样需要计算Hess矩阵及其逆矩阵。
但是随着迭代次数的增加,新构造的共轭方向由于误差(由于误差函数不是二次函数造成的)累积会逐渐不精确甚至不下降,可能会出现收敛速度极慢的现象。为了避免这样的现象发生,一种有效的方法是每迭代n次就让系数 β = 0 \beta=0 β=0,再次开始从最速下方向开始迭代。