图优化中的边缘化是SLAM中的重点和难点,其理论性复杂,不易理解,尤其是海森矩阵与协方差矩阵的关系更令人费解,从概率角度和矩阵消元角度解释图优化中的边缘化过程,并从数学上严格论证这两个理解角度的高度内在一致性。
图优化之边缘化
从信息融合的角度,边缘化的过程也可以解释为减少待求解量维度,只求解感兴趣的部分,而将不感兴趣的部分去掉,同时保留去掉部分的信息。总而言之,边缘化是为了有效地整合信息,同时保持系统变量的简洁性和可管理性。边缘化有两种解释。
1、概率解释
边缘化即从全体变量的联合分布中得到部分变量的边缘分布。通常用于在不丢失信息的情况下,移除不需要的变量。
下面推导全体增量的联合分布并得到部分增量的边缘分布
根据前文的讨论,可得
x
∗
=
arg
min
x
∑
(
x
i
,
x
j
)
∈
C
e
i
j
(
x
)
T
Ω
i
j
e
i
j
(
x
)
\mathbf{x}^{\ast} =\underset{\mathbf{x}}{\arg\min}\,\sum_{(\mathbf{x}_{i},\mathbf{x}_{j})\in{C}}\mathbf{e}_{ij}(\mathbf{x})^{T}\boldsymbol\Omega_{ij}\mathbf{e}_{ij}(\mathbf{x})
x∗=xargmin(xi,xj)∈C∑eij(x)TΩijeij(x)
上式实际上是认为误差向量
e
i
j
(
x
)
\mathbf{e}_{ij}(\mathbf{x})
eij(x)服从以
0
\mathbf{0}
0为均值,
Ω
i
j
−
1
\boldsymbol\Omega^{-1}_{ij}
Ωij−1为协方差矩阵的正态分布,即
e
i
j
(
x
)
∼
N
(
0
,
Ω
i
j
−
1
)
(12)
\mathbf{e}_{ij}(\mathbf{x})\sim\mathcal{N}(\mathbf{0},\boldsymbol\Omega_{ij}^{-1})\tag{12}
eij(x)∼N(0,Ωij−1)(12)
并求解最大似然估计问题
x
∗
=
arg
max
x
∏
(
x
i
,
x
j
)
∈
C
p
(
e
i
j
(
x
)
)
(13)
\mathbf{x}^{\ast} =\underset{\mathbf{x}}{\arg\max}\,\prod_{(\mathbf{x}_{i},\mathbf{x}_{j})\in{C}}p(\mathbf{e}_{ij}(\mathbf{x}))\tag{13}
x∗=xargmax(xi,xj)∈C∏p(eij(x))(13)
每步迭代中,需要用
Δ
x
k
\Delta\mathbf{x}_{k}
Δxk更新
x
k
\mathbf{x}_{k}
xk,下面求解
Δ
x
k
\Delta\mathbf{x}_{k}
Δxk的概率分布
p
(
Δ
x
k
)
p(\Delta\mathbf{x}_{k})
p(Δxk)
将
e
i
j
(
x
k
⊕
Δ
x
k
)
\mathbf{e}_{ij}(\mathbf{x}_{k}\oplus\Delta\mathbf{x}_{k})
eij(xk⊕Δxk)在当前状态
x
k
\mathbf{x}_{k}
xk处进行泰勒展开
e
i
j
(
x
k
⊕
Δ
x
k
)
≈
e
i
j
(
x
k
)
+
J
i
j
Δ
x
k
\mathbf{e}_{ij}(\mathbf{x}_{k}\oplus\Delta\mathbf{x}_{k}) \approx\mathbf{e}_{ij}(\mathbf{x}_{k})+\mathbf{J}_{ij}\Delta\mathbf{x}_{k}
eij(xk⊕Δxk)≈eij(xk)+JijΔxk
略写部份量的下标
k
k
k,下同
由上式知,
e
i
j
(
x
k
⊕
Δ
x
k
)
\mathbf{e}_{ij}(\mathbf{x}_{k}\oplus\Delta\mathbf{x}_{k})
eij(xk⊕Δxk)的不确定性来完全来源于
Δ
x
k
\Delta\mathbf{x}_{k}
Δxk,反过来说,
Δ
x
k
\Delta\mathbf{x}_{k}
Δxk的不确定性也可用
e
i
j
(
x
k
⊕
Δ
x
k
)
\mathbf{e}_{ij}(\mathbf{x}_{k}\oplus\Delta\mathbf{x}_{k})
eij(xk⊕Δxk)的不确定性来表示
p
(
Δ
x
k
)
=
∏
(
x
i
,
x
j
)
∈
C
p
(
e
i
j
(
x
k
⊕
Δ
x
k
)
)
=
η
∏
(
x
i
,
x
j
)
∈
C
exp
(
−
1
2
(
J
i
j
Δ
x
k
+
e
i
j
(
x
k
)
)
T
Ω
i
j
(
J
i
j
Δ
x
k
+
e
i
j
(
x
k
)
)
)
p(\Delta\mathbf{x}_{k})=\prod_{(\mathbf{x}_{i},\mathbf{x}_{j})\in{C}}p(\mathbf{e}_{ij}(\mathbf{x}_{k}\oplus\Delta\mathbf{x}_{k}))=\eta\prod_{(\mathbf{x}_{i},\mathbf{x}_{j})\in{C}}\exp\left(-\frac{1}{2}\left(\mathbf{J}_{ij}\Delta\mathbf{x}_{k}+\mathbf{e}_{ij}(\mathbf{x}_{k})\right)^{T}\boldsymbol\Omega_{ij}\left(\mathbf{J}_{ij}\Delta\mathbf{x}_{k}+\mathbf{e}_{ij}(\mathbf{x}_{k})\right)\right)
p(Δxk)=(xi,xj)∈C∏p(eij(xk⊕Δxk))=η(xi,xj)∈C∏exp(−21(JijΔxk+eij(xk))TΩij(JijΔxk+eij(xk)))
根据高斯概率密度函数的归一化积定理,可以求出
p
(
Δ
x
k
)
=
N
(
(
∑
(
x
i
,
x
j
)
∈
C
J
i
j
T
Ω
i
j
J
i
j
)
−
1
(
∑
(
x
i
,
x
j
)
∈
C
J
i
j
T
Ω
i
j
e
i
j
)
,
(
∑
(
x
i
,
x
j
)
∈
C
J
i
j
T
Ω
i
j
J
i
j
)
−
1
)
=
N
(
H
−
1
b
,
H
−
1
)
(14)
\begin{align*} p(\Delta\mathbf{x}_{k}) &=\mathcal{N}\left(\left(\sum_{(\mathbf{x}_{i},\mathbf{x}_{j})\in{C}}\mathbf{J}_{ij}^{T}\boldsymbol\Omega_{ij}\mathbf{J}_{ij}\right)^{-1}\left(\sum_{(\mathbf{x}_{i},\mathbf{x}_{j})\in{C}}\mathbf{J}_{ij}^{T}\boldsymbol\Omega_{ij}\mathbf{e}_{ij}\right),\left(\sum_{(\mathbf{x}_{i},\mathbf{x}_{j})\in{C}}\mathbf{J}_{ij}^{T}\boldsymbol\Omega_{ij}\mathbf{J}_{ij}\right)^{-1}\right)\\ &=\mathcal{N}(\mathbf{H}^{-1}\mathbf{b},\mathbf{H}^{-1}) \end{align*}\tag{14}
p(Δxk)=N
(xi,xj)∈C∑JijTΩijJij
−1
(xi,xj)∈C∑JijTΩijeij
,
(xi,xj)∈C∑JijTΩijJij
−1
=N(H−1b,H−1)(14)
不妨假设需要保留
Δ
x
k
\Delta\mathbf{x}_{k}
Δxk的前
p
p
p维的变量
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times1}
Δxp×1,而将后
q
=
N
−
p
q=N-p
q=N−p维的变量
Δ
x
q
×
1
\Delta\mathbf{x}_{q\times1}
Δxq×1边缘化出去,即求
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times1}
Δxp×1的边缘分布为
p
(
Δ
x
p
×
1
)
p(\Delta\mathbf{x}_{p\times1})
p(Δxp×1)
首先将
(
14
)
(14)
(14)写成
p
(
Δ
x
p
×
1
,
Δ
x
q
×
1
)
=
N
(
[
H
p
×
p
H
p
×
q
H
p
×
q
T
H
q
×
q
]
−
1
[
b
p
×
1
b
q
×
1
]
,
[
H
p
×
p
H
p
×
q
H
p
×
q
T
H
q
×
q
]
−
1
)
p(\Delta\mathbf{x}_{p\times1},\Delta\mathbf{x}_{q\times1})=\mathcal{N}\left(\begin{bmatrix} \mathbf{H}_{p\times{p}}&\mathbf{H}_{p\times{q}}\\ \mathbf{H}_{p\times{q}}^{T}&\mathbf{H}_{q\times{q}} \end{bmatrix}^{-1}\begin{bmatrix} \mathbf{b}_{p\times{1}}\\ \mathbf{b}_{q\times{1}} \end{bmatrix}, \begin{bmatrix} \mathbf{H}_{p\times{p}}&\mathbf{H}_{p\times{q}}\\ \mathbf{H}_{p\times{q}}^{T}&\mathbf{H}_{q\times{q}} \end{bmatrix}^{-1} \right)
p(Δxp×1,Δxq×1)=N([Hp×pHp×qTHp×qHq×q]−1[bp×1bq×1],[Hp×pHp×qTHp×qHq×q]−1)
经分析可得到,
Δ
x
1
\Delta\mathbf{x}_{1}
Δx1的边缘分布为
p
(
Δ
x
p
×
1
)
=
N
(
(
H
p
×
p
−
H
p
×
q
H
q
×
q
−
1
H
p
×
q
T
)
−
1
(
b
p
×
1
−
H
p
×
q
H
q
×
q
−
1
b
q
×
1
)
,
(
H
p
×
p
−
H
p
×
q
H
q
×
q
−
1
H
p
×
q
T
)
−
1
)
(15)
p(\Delta\mathbf{x}_{p\times1})= \mathcal{N}\left(\left(\mathbf{H}_{p\times{p}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{H}^{T}_{p\times{q}}\right)^{-1}\left(\mathbf{b}_{p\times1}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{b}_{q\times1}\right),\left(\mathbf{H}_{p\times{p}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{H}^{T}_{p\times{q}}\right)^{-1}\right)\tag{15}
p(Δxp×1)=N((Hp×p−Hp×qHq×q−1Hp×qT)−1(bp×1−Hp×qHq×q−1bq×1),(Hp×p−Hp×qHq×q−1Hp×qT)−1)(15)
可以根据上式取
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times1}
Δxp×1的均值作为其估计值,固定
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times1}
Δxp×1,可以得到条件分布
p
(
Δ
x
q
×
1
∣
Δ
x
p
×
1
)
=
N
(
H
q
×
q
−
1
(
b
q
×
1
−
H
p
×
q
T
Δ
x
q
×
1
)
,
H
q
×
q
−
1
)
(16)
p(\Delta\mathbf{x}_{q\times1}|\Delta\mathbf{x}_{p\times1})= \mathcal{N}(\mathbf{H}^{-1}_{q\times{q}}(\mathbf{b}_{q\times1}-\mathbf{H}^{T}_{p\times{q}}\Delta\mathbf{x}_{q\times1}),\mathbf{H}^{-1}_{q\times{q}})\tag{16}
p(Δxq×1∣Δxp×1)=N(Hq×q−1(bq×1−Hp×qTΔxq×1),Hq×q−1)(16)
可以根据上式取
Δ
x
q
×
1
\Delta\mathbf{x}_{q\times1}
Δxq×1的均值作为其估计值。
2、增量方程消元解释
边缘化的过程还可以从增量方程的角度解释
[
H
p
×
p
H
p
×
q
H
p
×
q
T
H
q
×
q
]
[
Δ
x
p
×
1
Δ
x
q
×
1
]
=
[
b
p
×
1
b
q
×
1
]
(17)
\begin{bmatrix} \mathbf{H}_{p\times{p}}&\mathbf{H}_{p\times{q}}\\ \mathbf{H}_{p\times{q}}^{T}&\mathbf{H}_{q\times{q}} \end{bmatrix}\begin{bmatrix} \Delta\mathbf{x}_{p\times{1}}\\ \Delta\mathbf{x}_{q\times{1}} \end{bmatrix}=\begin{bmatrix} \mathbf{b}_{p\times{1}}\\ \mathbf{b}_{q\times{1}} \end{bmatrix}\tag{17}
[Hp×pHp×qTHp×qHq×q][Δxp×1Δxq×1]=[bp×1bq×1](17)
将
Δ
x
q
×
1
\Delta\mathbf{x}_{q\times1}
Δxq×1边缘化出去,得到
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times1}
Δxp×1的边缘分布的均值的过程,等价于对海森矩阵进行高斯消元,消去右上角部分
H
p
×
q
\mathbf{H}_{p\times{q}}
Hp×q,即将上式等号左右乘以高斯消元矩阵,即
[
I
−
H
p
×
q
H
q
×
q
−
1
O
I
]
[
H
p
×
p
H
p
×
q
H
p
×
q
T
H
q
×
q
]
[
Δ
x
p
×
1
Δ
x
q
×
1
]
=
[
I
−
H
p
×
q
H
q
×
q
−
1
O
I
]
[
b
p
×
1
b
q
×
1
]
\begin{bmatrix} \mathbf{I}&-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix}\begin{bmatrix} \mathbf{H}_{p\times{p}}&\mathbf{H}_{p\times{q}}\\ \mathbf{H}_{p\times{q}}^{T}&\mathbf{H}_{q\times{q}} \end{bmatrix}\begin{bmatrix} \Delta\mathbf{x}_{p\times{1}}\\ \Delta\mathbf{x}_{q\times{1}} \end{bmatrix}= \begin{bmatrix} \mathbf{I}&-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix}\begin{bmatrix} \mathbf{b}_{p\times{1}}\\ \mathbf{b}_{q\times{1}} \end{bmatrix}
[IO−Hp×qHq×q−1I][Hp×pHp×qTHp×qHq×q][Δxp×1Δxq×1]=[IO−Hp×qHq×q−1I][bp×1bq×1]
化简得到
[
H
p
×
p
−
H
p
×
q
H
q
×
q
−
1
H
p
×
q
T
O
H
p
×
q
T
H
q
×
q
]
[
Δ
x
p
×
1
Δ
x
q
×
1
]
=
[
b
p
×
1
−
H
p
×
q
H
q
×
q
−
1
b
q
×
1
b
q
×
1
]
(18)
\begin{bmatrix} \mathbf{H}_{p\times{p}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{H}^{T}_{p\times{q}}&\mathbf{O}\\ \mathbf{H}_{p\times{q}}^{T}&\mathbf{H}_{q\times{q}} \end{bmatrix}\begin{bmatrix} \Delta\mathbf{x}_{p\times{1}}\\ \Delta\mathbf{x}_{q\times{1}} \end{bmatrix}=\begin{bmatrix} \mathbf{b}_{p\times{1}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{b}_{q\times{1}}\\ \mathbf{b}_{q\times{1}} \end{bmatrix}\tag{18}
[Hp×p−Hp×qHq×q−1Hp×qTHp×qTOHq×q][Δxp×1Δxq×1]=[bp×1−Hp×qHq×q−1bq×1bq×1](18)
上式第一行成为与
Δ
x
q
×
1
\Delta\mathbf{x}_{q\times{1}}
Δxq×1无关的量,将其单独取出
(
H
p
×
p
−
H
p
×
q
H
q
×
q
−
1
H
p
×
q
T
)
Δ
x
p
×
1
=
b
p
×
1
−
H
p
×
q
H
q
×
q
−
1
b
q
×
1
(19)
\left(\mathbf{H}_{p\times{p}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{H}^{T}_{p\times{q}}\right)\Delta\mathbf{x}_{p\times{1}}=\mathbf{b}_{p\times{1}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{b}_{q\times{1}}\tag{19}
(Hp×p−Hp×qHq×q−1Hp×qT)Δxp×1=bp×1−Hp×qHq×q−1bq×1(19)
根据上式可以解出
Δ
x
p
×
1
=
(
H
p
×
p
−
H
p
×
q
H
q
×
q
−
1
H
p
×
q
T
)
−
1
(
b
p
×
1
−
H
p
×
q
H
q
×
q
−
1
b
q
×
1
)
(20)
\Delta\mathbf{x}_{p\times{1}}=\left(\mathbf{H}_{p\times{p}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{H}^{T}_{p\times{q}}\right)^{-1}(\mathbf{b}_{p\times{1}}-\mathbf{H}_{p\times{q}}\mathbf{H}^{-1}_{q\times{q}}\mathbf{b}_{q\times{1}})\tag{20}
Δxp×1=(Hp×p−Hp×qHq×q−1Hp×qT)−1(bp×1−Hp×qHq×q−1bq×1)(20)
与
(
15
)
(15)
(15)中的结果一致。
固定
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times1}
Δxp×1,得到已知
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times1}
Δxp×1条件下
Δ
x
q
×
1
\Delta\mathbf{x}_{q\times1}
Δxq×1的条件分布均值的过程等价于固定根据
(
19
)
(19)
(19)解出的
Δ
x
p
×
1
\Delta\mathbf{x}_{p\times{1}}
Δxp×1,根据
(
18
)
(18)
(18)第二行解出
Δ
x
q
×
1
=
H
q
×
q
−
1
(
b
q
×
1
−
H
p
×
q
T
Δ
x
p
×
1
)
(21)
\Delta\mathbf{x}_{q\times{1}}=\mathbf{H}^{-1}_{q\times{q}}(\mathbf{b}_{q\times{1}}-\mathbf{H}_{p\times{q}}^{T}\Delta\mathbf{x}_{p\times{1}})\tag{21}
Δxq×1=Hq×q−1(bq×1−Hp×qTΔxp×1)(21)
上式中的结果与
(
16
)
(16)
(16)一致。
综上可得到,概率角度解释的边缘化与矩阵消元角度解释的边缘化具有高度一致性。
§、从联合分布中分离先验概率和条件概率
现有两个随机变量 x ∈ R p × 1 \mathbf{x}\in\mathbb{R}^{p\times1} x∈Rp×1和 y ∈ R q × 1 \mathbf{y}\in\mathbb{R}^{q\times1} y∈Rq×1,下面根据其联合分布 p ( x , y ) p(\mathbf{x},\mathbf{y}) p(x,y)求解边缘分布 p ( x ) , p ( y ) p(\mathbf{x}),p(\mathbf{y}) p(x),p(y),和条件分布 p ( x ∣ y ) , p ( y ∣ x ) p(\mathbf{x}|\mathbf{y}),p(\mathbf{y}|\mathbf{x}) p(x∣y),p(y∣x)
为方便推导,假定均值为 0 \mathbf{0} 0,若实际均值非零,做代换 x = x − μ p × 1 \mathbf{x}=\mathbf{x}-\boldsymbol\mu_{p\times1} x=x−μp×1, y = y − μ q × 1 \mathbf{y}=\mathbf{y}-\boldsymbol\mu_{q\times1} y=y−μq×1即可,代换后不影响协方差矩阵的结果
§1、从协方差矩阵表示的联合分布中分离先验概率和条件概率
若联合概率分布用协方差矩阵表示
p
(
x
,
y
)
=
η
exp
(
−
1
2
[
x
y
]
T
[
Σ
p
×
p
Σ
p
×
q
Σ
p
×
q
T
Σ
q
×
q
]
−
1
[
x
y
]
)
(A4)
p(\mathbf{x},\mathbf{y})=\eta\exp\left(-\frac{1}{2} \begin{bmatrix} \mathbf{x}\\ \mathbf{y} \end{bmatrix}^{T} \begin{bmatrix} \boldsymbol\Sigma_{p\times{p}}&\boldsymbol\Sigma_{p\times{q}}\\ \boldsymbol\Sigma_{p\times{q}}^{T}&\boldsymbol\Sigma_{q\times{q}}\\ \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{x}\\ \mathbf{y} \end{bmatrix} \right)\tag{A4}
p(x,y)=ηexp(−21[xy]T[Σp×pΣp×qTΣp×qΣq×q]−1[xy])(A4)
记协方差矩阵为
Σ
\boldsymbol\Sigma
Σ,因为
Σ
q
×
q
\boldsymbol\Sigma_{q\times{q}}
Σq×q可逆,故可将
Λ
\boldsymbol\Lambda
Λ做分解
Σ
=
[
Σ
p
×
p
Σ
p
×
q
Σ
p
×
q
T
Σ
q
×
q
]
=
[
I
Σ
p
×
q
Σ
q
×
q
−
1
O
I
]
[
Σ
p
×
p
−
Σ
p
×
q
Σ
q
×
q
−
1
Σ
p
×
q
T
O
O
Σ
q
×
q
]
[
I
O
Σ
q
×
q
−
1
Σ
p
×
q
T
I
]
\boldsymbol\Sigma= \begin{bmatrix} \boldsymbol\Sigma_{p\times{p}}&\boldsymbol\Sigma_{p\times{q}}\\ \boldsymbol\Sigma_{p\times{q}}^{T}&\boldsymbol\Sigma_{q\times{q}} \end{bmatrix} =\begin{bmatrix} \mathbf{I}&\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix} \begin{bmatrix} \boldsymbol\Sigma_{p\times{p}}-\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma^{T}_{p\times{q}}&\mathbf{O}\\ \mathbf{O}&\boldsymbol\Sigma_{q\times{q}} \end{bmatrix} \begin{bmatrix} \mathbf{I}&\mathbf{O}\\ \boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma^{T}_{p\times{q}}&\mathbf{I} \end{bmatrix}
Σ=[Σp×pΣp×qTΣp×qΣq×q]=[IOΣp×qΣq×q−1I][Σp×p−Σp×qΣq×q−1Σp×qTOOΣq×q][IΣq×q−1Σp×qTOI]
因此协方差矩阵的逆为
Σ
−
1
=
[
I
O
−
Σ
p
×
q
Σ
q
×
q
−
1
I
]
[
(
Σ
p
×
p
−
Σ
p
×
q
Σ
q
×
q
−
1
Σ
p
×
q
T
)
−
1
O
O
Σ
q
×
q
−
1
]
[
I
−
Σ
q
×
q
−
1
Σ
p
×
q
T
O
I
]
\boldsymbol\Sigma^{-1} =\begin{bmatrix} \mathbf{I}&\mathbf{O}\\ -\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}&\mathbf{I} \end{bmatrix} \begin{bmatrix} (\boldsymbol\Sigma_{p\times{p}}-\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma^{T}_{p\times{q}})^{-1}&\mathbf{O}\\ \mathbf{O}&\boldsymbol\Sigma^{-1}_{q\times{q}} \end{bmatrix} \begin{bmatrix} \mathbf{I}&-\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma^{T}_{p\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix}
Σ−1=[I−Σp×qΣq×q−1OI][(Σp×p−Σp×qΣq×q−1Σp×qT)−1OOΣq×q−1][IO−Σq×q−1Σp×qTI]
将分解结果代入
(
A
4
)
\mathrm{(A4)}
(A4)可得
p
(
x
,
y
)
=
η
exp
(
(
x
−
Σ
p
×
q
Σ
q
×
q
−
1
y
)
T
(
Σ
p
×
p
−
Σ
p
×
q
Σ
q
×
q
−
1
Σ
p
×
q
T
)
−
1
(
x
−
Σ
p
×
q
Σ
q
×
q
−
1
y
)
)
exp
(
y
T
Σ
q
×
q
−
1
y
)
p(\mathbf{x},\mathbf{y})=\eta\exp\left((\mathbf{x}-\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\mathbf{y})^{T}(\boldsymbol\Sigma_{p\times{p}}-\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma^{T}_{p\times{q}})^{-1}(\mathbf{x}-\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\mathbf{y})\right)\exp(\mathbf{y}^{T}\boldsymbol\Sigma^{-1}_{q\times{q}}\mathbf{y})
p(x,y)=ηexp((x−Σp×qΣq×q−1y)T(Σp×p−Σp×qΣq×q−1Σp×qT)−1(x−Σp×qΣq×q−1y))exp(yTΣq×q−1y)
因此可以得到
p
(
y
)
=
N
(
0
,
Σ
q
×
q
)
p
(
x
∣
y
)
=
N
(
Σ
p
×
q
Σ
q
×
q
−
1
y
,
Σ
p
×
p
−
Σ
p
×
q
Σ
q
×
q
−
1
Σ
p
×
q
T
)
\begin{align*} p(\mathbf{y})&=\mathcal{N}(\mathbf{0},\boldsymbol\Sigma_{q\times{q}})\tag{A5}\\ p(\mathbf{x}|\mathbf{y})&=\mathcal{N}(\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\mathbf{y},\boldsymbol\Sigma_{p\times{p}}-\boldsymbol\Sigma_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma^{T}_{p\times{q}})\tag{A6}\\ \end{align*}
p(y)p(x∣y)=N(0,Σq×q)=N(Σp×qΣq×q−1y,Σp×p−Σp×qΣq×q−1Σp×qT)(A5)(A6)
故可将
Λ
\boldsymbol\Lambda
Λ做分解
Σ
=
[
Σ
p
×
p
Σ
p
×
q
Σ
p
×
q
T
Σ
q
×
q
]
=
[
I
O
Σ
p
×
q
T
Σ
p
×
p
−
1
I
]
[
Σ
p
×
p
O
O
Σ
q
×
q
−
Σ
p
×
q
T
Σ
q
×
q
−
1
Σ
p
×
q
]
[
I
Σ
p
×
p
−
1
Σ
p
×
q
O
I
]
\boldsymbol\Sigma= \begin{bmatrix} \boldsymbol\Sigma_{p\times{p}}&\boldsymbol\Sigma_{p\times{q}}\\ \boldsymbol\Sigma_{p\times{q}}^{T}&\boldsymbol\Sigma_{q\times{q}} \end{bmatrix} =\begin{bmatrix} \mathbf{I}&\mathbf{O}\\ \boldsymbol\Sigma_{p\times{q}}^{T}\boldsymbol\Sigma^{-1}_{p\times{p}}&\mathbf{I} \end{bmatrix} \begin{bmatrix} \boldsymbol\Sigma_{p\times{p}}&\mathbf{O}\\ \mathbf{O}&\boldsymbol\Sigma_{q\times{q}}-\boldsymbol\Sigma^{T}_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma_{p\times{q}} \end{bmatrix} \begin{bmatrix} \mathbf{I}&\boldsymbol\Sigma^{-1}_{p\times{p}}\boldsymbol\Sigma_{p\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix}
Σ=[Σp×pΣp×qTΣp×qΣq×q]=[IΣp×qTΣp×p−1OI][Σp×pOOΣq×q−Σp×qTΣq×q−1Σp×q][IOΣp×p−1Σp×qI]
因此协方差矩阵的逆可写为
Σ
−
1
=
[
I
−
Σ
p
×
p
−
1
Σ
p
×
q
O
I
]
[
Σ
p
×
p
−
1
O
O
(
Σ
q
×
q
−
Σ
p
×
q
T
Σ
q
×
q
−
1
Σ
p
×
q
)
−
1
]
[
I
O
−
Σ
p
×
q
T
Σ
p
×
p
−
1
I
]
\boldsymbol\Sigma^{-1} =\begin{bmatrix} \mathbf{I}&-\boldsymbol\Sigma_{p\times{p}}^{-1}\boldsymbol\Sigma_{p\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix} \begin{bmatrix} \boldsymbol\Sigma^{-1}_{p\times{p}}&\mathbf{O}\\ \mathbf{O}&(\boldsymbol\Sigma_{q\times{q}}-\boldsymbol\Sigma^{T}_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma_{p\times{q}})^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{I}&\mathbf{O}\\ -\boldsymbol\Sigma^{T}_{p\times{q}}\boldsymbol\Sigma^{-1}_{p\times{p}}&\mathbf{I} \end{bmatrix}
Σ−1=[IO−Σp×p−1Σp×qI][Σp×p−1OO(Σq×q−Σp×qTΣq×q−1Σp×q)−1][I−Σp×qTΣp×p−1OI]
将分解结果代入
(
A
4
)
\mathrm{(A4)}
(A4)可得
p
(
x
,
y
)
=
η
exp
(
x
T
Σ
p
×
p
−
1
x
)
exp
(
(
y
−
Σ
p
×
q
T
Σ
p
×
p
−
1
x
)
T
(
Σ
q
×
q
−
Σ
p
×
q
T
Σ
q
×
q
−
1
Σ
p
×
q
)
−
1
(
y
−
Σ
p
×
q
T
Σ
p
×
p
−
1
x
)
)
p(\mathbf{x},\mathbf{y})=\eta\exp(\mathbf{x}^{T}\boldsymbol\Sigma^{-1}_{p\times{p}}\mathbf{x})\exp\left((\mathbf{y}-\boldsymbol\Sigma_{p\times{q}}^{T}\boldsymbol\Sigma^{-1}_{p\times{p}}\mathbf{x})^{T}(\boldsymbol\Sigma_{q\times{q}}-\boldsymbol\Sigma^{T}_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma_{p\times{q}})^{-1}(\mathbf{y}-\boldsymbol\Sigma_{p\times{q}}^{T}\boldsymbol\Sigma^{-1}_{p\times{p}}\mathbf{x})\right)
p(x,y)=ηexp(xTΣp×p−1x)exp((y−Σp×qTΣp×p−1x)T(Σq×q−Σp×qTΣq×q−1Σp×q)−1(y−Σp×qTΣp×p−1x))
故
p
(
x
)
=
N
(
0
,
Σ
p
×
p
)
p
(
y
∣
x
)
=
N
(
Σ
p
×
q
T
Σ
p
×
p
−
1
x
,
Σ
q
×
q
−
Σ
p
×
q
T
Σ
q
×
q
−
1
Σ
p
×
q
−
1
)
\begin{align*} p(\mathbf{x})&=\mathcal{N}(\mathbf{0},\boldsymbol\Sigma_{p\times{p}})\tag{A7}\\ p(\mathbf{y}|\mathbf{x})&=\mathcal{N}(\boldsymbol\Sigma_{p\times{q}}^{T}\boldsymbol\Sigma^{-1}_{p\times{p}}\mathbf{x},\boldsymbol\Sigma_{q\times{q}}-\boldsymbol\Sigma^{T}_{p\times{q}}\boldsymbol\Sigma^{-1}_{q\times{q}}\boldsymbol\Sigma_{p\times{q}}^{-1})\tag{A8} \end{align*}
p(x)p(y∣x)=N(0,Σp×p)=N(Σp×qTΣp×p−1x,Σq×q−Σp×qTΣq×q−1Σp×q−1)(A7)(A8)
§2、从信息矩阵表示的联合分布中分离先验概率和条件概率
若联合概率分布用信息矩阵表示
p
(
x
,
y
)
=
η
exp
(
−
1
2
[
x
y
]
T
[
Λ
p
×
p
Λ
p
×
q
Λ
p
×
q
T
Λ
q
×
q
]
[
x
y
]
)
(A9)
p(\mathbf{x},\mathbf{y})=\eta\exp\left(-\frac{1}{2} \begin{bmatrix} \mathbf{x}\\ \mathbf{y} \end{bmatrix}^{T} \begin{bmatrix} \boldsymbol\Lambda_{p\times{p}}&\boldsymbol\Lambda_{p\times{q}}\\ \boldsymbol\Lambda_{p\times{q}}^{T}&\boldsymbol\Lambda_{q\times{q}}\\ \end{bmatrix} \begin{bmatrix} \mathbf{x}\\ \mathbf{y} \end{bmatrix} \right)\tag{A9}
p(x,y)=ηexp(−21[xy]T[Λp×pΛp×qTΛp×qΛq×q][xy])(A9)
记信息矩阵为
Λ
\boldsymbol\Lambda
Λ,因为
Λ
q
×
q
\boldsymbol\Lambda_{q\times{q}}
Λq×q可逆,故可将
Λ
\boldsymbol\Lambda
Λ做分解
Λ
=
[
Λ
p
×
p
Λ
p
×
q
Λ
p
×
q
T
Λ
q
×
q
]
=
[
I
Λ
p
×
q
Λ
q
×
q
−
1
O
I
]
[
Λ
p
×
p
−
Λ
p
×
q
Λ
q
×
q
−
1
Λ
p
×
q
T
O
O
Λ
q
×
q
]
[
I
O
Λ
q
×
q
−
1
Λ
p
×
q
T
I
]
\boldsymbol\Lambda= \begin{bmatrix} \boldsymbol\Lambda_{p\times{p}}&\boldsymbol\Lambda_{p\times{q}}\\ \boldsymbol\Lambda_{p\times{q}}^{T}&\boldsymbol\Lambda_{q\times{q}} \end{bmatrix} =\begin{bmatrix} \mathbf{I}&\boldsymbol\Lambda_{p\times{q}}\boldsymbol\Lambda^{-1}_{q\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix} \begin{bmatrix} \boldsymbol\Lambda_{p\times{p}}-\boldsymbol\Lambda_{p\times{q}}\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda^{T}_{p\times{q}}&\mathbf{O}\\ \mathbf{O}&\boldsymbol\Lambda_{q\times{q}} \end{bmatrix} \begin{bmatrix} \mathbf{I}&\mathbf{O}\\ \boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda^{T}_{p\times{q}}&\mathbf{I} \end{bmatrix}
Λ=[Λp×pΛp×qTΛp×qΛq×q]=[IOΛp×qΛq×q−1I][Λp×p−Λp×qΛq×q−1Λp×qTOOΛq×q][IΛq×q−1Λp×qTOI]
将分解结果代入
(
A
8
)
\mathrm{(A8)}
(A8)可得
p
(
x
,
y
)
=
η
exp
(
x
T
(
Λ
p
×
p
−
Λ
p
×
q
Λ
q
×
q
−
1
Λ
p
×
q
T
)
x
)
exp
(
(
y
+
Λ
q
×
q
−
1
Λ
p
×
q
T
x
)
T
Λ
q
×
q
(
y
+
Λ
q
×
q
−
1
Λ
p
×
q
T
x
)
)
\begin{align*} p(\mathbf{x},\mathbf{y}) &=\eta\exp\left(\mathbf{x}^{T}(\boldsymbol\Lambda_{p\times{p}}-\boldsymbol\Lambda_{p\times{q}}\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda^{T}_{p\times{q}})\mathbf{x}\right) \exp\left((\mathbf{y}+\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda^{T}_{p\times{q}}\mathbf{x})^{T}\boldsymbol\Lambda_{q\times{q}}(\mathbf{y}+\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda^{T}_{p\times{q}}\mathbf{x})\right) \end{align*}
p(x,y)=ηexp(xT(Λp×p−Λp×qΛq×q−1Λp×qT)x)exp((y+Λq×q−1Λp×qTx)TΛq×q(y+Λq×q−1Λp×qTx))
故
p
(
x
)
=
N
(
0
,
(
Λ
p
×
p
−
Λ
p
×
q
Λ
q
×
q
−
1
Λ
p
×
q
T
)
−
1
)
p
(
y
∣
x
)
=
N
(
−
Λ
q
×
q
−
1
Λ
p
×
q
T
x
,
Λ
q
×
q
−
1
)
\begin{align*} p(\mathbf{x})&=\mathcal{N}\left(\mathbf{0},(\boldsymbol\Lambda_{p\times{p}}-\boldsymbol\Lambda_{p\times{q}}\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda^{T}_{p\times{q}})^{-1}\right)\tag{A10}\\ p(\mathbf{y}|\mathbf{x})&=\mathcal{N}(-\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda^{T}_{p\times{q}}\mathbf{x},\boldsymbol\Lambda_{q\times{q}}^{-1})\tag{A11}\\ \end{align*}
p(x)p(y∣x)=N(0,(Λp×p−Λp×qΛq×q−1Λp×qT)−1)=N(−Λq×q−1Λp×qTx,Λq×q−1)(A10)(A11)
将将
Λ
\boldsymbol\Lambda
Λ做分解
Λ
=
[
Λ
p
×
p
Λ
p
×
q
Λ
p
×
q
T
Λ
q
×
q
]
=
[
I
O
Λ
p
×
q
T
Λ
p
×
p
−
1
I
]
[
Λ
p
×
p
O
O
Λ
q
×
q
−
Λ
p
×
q
T
Λ
q
×
q
−
1
Λ
p
×
q
]
[
I
Λ
p
×
p
−
1
Λ
p
×
q
O
I
]
\boldsymbol\Lambda= \begin{bmatrix} \boldsymbol\Lambda_{p\times{p}}&\boldsymbol\Lambda_{p\times{q}}\\ \boldsymbol\Lambda_{p\times{q}}^{T}&\boldsymbol\Lambda_{q\times{q}} \end{bmatrix} =\begin{bmatrix} \mathbf{I}&\mathbf{O}\\ \boldsymbol\Lambda_{p\times{q}}^{T}\boldsymbol\Lambda^{-1}_{p\times{p}}&\mathbf{I} \end{bmatrix} \begin{bmatrix} \boldsymbol\Lambda_{p\times{p}}&\mathbf{O}\\ \mathbf{O}&\boldsymbol\Lambda_{q\times{q}}-\boldsymbol\Lambda^{T}_{p\times{q}}\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda_{p\times{q}} \end{bmatrix} \begin{bmatrix} \mathbf{I}&\boldsymbol\Lambda^{-1}_{p\times{p}}\boldsymbol\Lambda_{p\times{q}}\\ \mathbf{O}&\mathbf{I} \end{bmatrix}
Λ=[Λp×pΛp×qTΛp×qΛq×q]=[IΛp×qTΛp×p−1OI][Λp×pOOΛq×q−Λp×qTΛq×q−1Λp×q][IOΛp×p−1Λp×qI]
将分解结果代入
(
A
8
)
\mathrm{(A8)}
(A8)可得
p
(
x
,
y
)
=
η
exp
(
(
x
+
Λ
p
×
p
−
1
Λ
p
×
q
y
)
T
Λ
p
×
p
(
x
+
Λ
p
×
p
−
1
Λ
p
×
q
y
)
)
exp
(
y
T
(
Λ
q
×
q
−
Λ
p
×
q
T
Λ
q
×
q
−
1
Λ
p
×
q
)
y
)
\begin{align*} p(\mathbf{x},\mathbf{y}) &=\eta\exp\left((\mathbf{x}+\boldsymbol\Lambda^{-1}_{p\times{p}}\boldsymbol\Lambda_{p\times{q}}\mathbf{y})^{T}\boldsymbol\Lambda_{p\times{p}}(\mathbf{x}+\boldsymbol\Lambda^{-1}_{p\times{p}}\boldsymbol\Lambda_{p\times{q}}\mathbf{y})\right) \exp\left(\mathbf{y}^{T}(\boldsymbol\Lambda_{q\times{q}}-\boldsymbol\Lambda^{T}_{p\times{q}}\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda_{p\times{q}})\mathbf{y}\right) \end{align*}
p(x,y)=ηexp((x+Λp×p−1Λp×qy)TΛp×p(x+Λp×p−1Λp×qy))exp(yT(Λq×q−Λp×qTΛq×q−1Λp×q)y)
故
p
(
y
)
=
N
(
0
,
(
Λ
q
×
q
−
Λ
p
×
q
T
Λ
q
×
q
−
1
Λ
p
×
q
)
−
1
)
p
(
x
∣
y
)
=
N
(
−
Λ
p
×
p
−
1
Λ
p
×
q
y
,
Λ
p
×
p
−
1
)
\begin{align*} p(\mathbf{y})&=\mathcal{N}\left(\mathbf{0},(\boldsymbol\Lambda_{q\times{q}}-\boldsymbol\Lambda^{T}_{p\times{q}}\boldsymbol\Lambda^{-1}_{q\times{q}}\boldsymbol\Lambda_{p\times{q}})^{-1}\right)\tag{A12}\\ p(\mathbf{x}|\mathbf{y})&=\mathcal{N}(-\boldsymbol\Lambda^{-1}_{p\times{p}}\boldsymbol\Lambda_{p\times{q}}\mathbf{y},\boldsymbol\Lambda_{p\times{p}}^{-1})\tag{A13}\\ \end{align*}
p(y)p(x∣y)=N(0,(Λq×q−Λp×qTΛq×q−1Λp×q)−1)=N(−Λp×p−1Λp×qy,Λp×p−1)(A12)(A13)