给定一个 Hilbert space 中的向量 f f f 和一个称为字典的向量组 D = { x 1 , … , x n } D=\{x_1,\dots,x_n\} D={x1,…,xn},且 ∥ x i ∥ = 1 \parallel x_i \parallel=1 ∥xi∥=1,想用 D D D 中的某些元素的组合尽量逼近 f f f,反过来说,是将 f f f 拆成 D D D 中元素的组合。
Matching Pursuit
MP [1] 的思路是递归地增量逼近。当随便拿一个
x
i
x_i
xi 近似时,MP 只取
f
f
f 在
x
i
x_i
xi 上正交投影,此时 f 可以拆成:
f
=
f
^
+
R
f
=
<
f
,
x
i
>
x
i
+
R
f
\begin{aligned} f&=\hat f + Rf \\ &=<f,x_i>x_i + Rf \end{aligned}
f=f^+Rf=<f,xi>xi+Rf 其中
f
^
\hat f
f^ 是近似值,
<
⋅
,
⋅
>
<\cdot,\cdot>
<⋅,⋅> 是内积,
R
f
Rf
Rf 表示
f
^
\hat f
f^ 与
f
f
f 之间的误差。

为了使近似效果最好,应选一个使得 ∣ < f , x i > ∣ |<f,x_i>| ∣<f,xi>∣ 最大的 x i x_i xi,[1] 中说有时需要将这个限制松弛成 ∣ < f , x i > ∣ ≥ α ⋅ sup j ∣ < f , x j > ∣ , 0 < α < 1 |<f,x_i>|\geq\alpha\cdot\sup_j|<f,x_j>|,\quad 0<\alpha<1 ∣<f,xi>∣≥α⋅jsup∣<f,xj>∣,0<α<1
选定上述的最优 x i x_i xi 之后,可以继续用 D D D 拆误差 R f Rf Rf: R f = < R f , x i ′ > x i ′ + R ( R f ) Rf=<Rf,x_{i'}>x_{i'}+R(Rf) Rf=<Rf,xi′>xi′+R(Rf) 于是原来的近似可以修正成: f = < f , x i > x i + R f = < f , x i > x i + < R f , x i ′ > x i ′ ⏟ f ^ + R ( R f ) \begin{aligned} f &= <f,x_i>x_i + Rf \\ &=\underbrace{<f,x_i>x_i + <Rf,x_{i'}>x_{i'}}_{\hat f} + R(Rf) \end{aligned} f=<f,xi>xi+Rf=f^ <f,xi>xi+<Rf,xi′>xi′+R(Rf) 此时有 ∥ R f ∥ 2 = ∣ < R f , x i ′ > ∣ 2 + ∥ R ( R f ) ∥ 2 ≥ ∥ R ( R f ) ∥ 2 \parallel Rf \parallel^2=|<Rf,x_{i'}>|^2+\parallel R(Rf) \parallel^2\geq\parallel R(Rf) \parallel^2 ∥Rf∥2=∣<Rf,xi′>∣2+∥R(Rf)∥2≥∥R(Rf)∥2 即近似误差更小了。这可以导出一个迭代算法,表示成: f ≈ f ^ m = ∑ i = 1 m < R i − 1 f , x σ i > x σ i + R m f f\approx\hat f_m=\sum_{i=1}^m<R_{i-1}f,x_{\sigma_i}>x_{\sigma_i}+R_mf f≈f^m=i=1∑m<Ri−1f,xσi>xσi+Rmf 其中 x σ i x_{\sigma_i} xσi 是第 i 步选的向量, R i f R_if Rif 是第 i 步的误差, R 0 f = f R_0f=f R0f=f。
记 V = span { x 1 , … , x n } ‾ V=\overline{\text{span}\{x_1,\dots,x_n\}} V=span{x1,…,xn} 是 D D D 张成的空间(closed linear span of the dictionary vectors),则当 V = H V=H V=H 时, f f f 必能被 D D D 组合得出即 lim m → + ∞ f ^ m = f lim m → + ∞ R m f = 0 \lim_{m\rightarrow + \infty}\hat f_m=f \\ \lim_{m\rightarrow + \infty}R_mf=0 m→+∞limf^m=fm→+∞limRmf=0
注意,MP 中每个 x i x_i xi 允许多次被选中。
Orthogonal Matching Pursuit
OMP [2] 表示 MP 只保证 R i f ⊥ x σ i R_if \perp x_{\sigma_i} Rif⊥xσi,想收敛可能需要很多步迭代。当限制迭代次数 m m m 时,MP 得出的结果不是最优的。
OMP 增加了对 R i f R_if Rif 的约束,使其与所有 x σ j ( j = 1 , … , i − 1 ) x_{\sigma_j}(j=1,\dots,i-1) xσj(j=1,…,i−1) 都正交(full backward orthogonality)。
假设在第 k 步的拆分为: f = ∑ i = 1 k a i k x σ i + R k f (1) f=\sum_{i=1}^ka^k_ix_{\sigma_i}+R_kf \tag{1} f=i=1∑kaikxσi+Rkf(1) 其中 a i k a^k_i aik 是第 k 步中的组合系数, R k f R_kf Rkf 满足 R k f ⊥ x σ j , j = 1 , … , k R_kf\perp x_{\sigma_j},\quad j=1,\dots,k Rkf⊥xσj,j=1,…,k,即 full backward orthogonality。在第 k + 1 步,按 MP 的方式选 x σ k + 1 x_{\sigma_{k+1}} xσk+1,新的拆分为: f = ∑ i = 1 k + 1 a i k + 1 x σ i + R k + 1 f (2) f=\sum_{i=1}^{k+1}a^{k+1}_ix_{\sigma_i}+R_{k+1}f \tag{2} f=i=1∑k+1aik+1xσi+Rk+1f(2) 约束 R k + 1 f R_{k+1}f Rk+1f 同样满足 full backward orthogonality。为了确定系数 a i k + 1 a^{k+1}_i aik+1,先将 x σ k + 1 x_{\sigma_{k+1}} xσk+1 同样用 { x σ 1 , … , x σ k } \{x_{\sigma_1},\dots,x_{\sigma_k}\} {xσ1,…,xσk} 拆成: x σ k + 1 = ∑ i = 1 k b i k x σ i + γ k (3) x_{\sigma_{k+1}}=\sum_{i=1}^kb^{k}_ix_{\sigma_i}+\gamma_k\tag{3} xσk+1=i=1∑kbikxσi+γk(3) 其中 b i k b^k_i bik 和 γ k \gamma_k γk 都是用某种方法算出来的。 γ k \gamma_k γk 是 x σ k + 1 x_{\sigma_{k+1}} xσk+1 中用 { x σ 1 , … , x σ k } \{x_{\sigma_1},\dots,x_{\sigma_k}\} {xσ1,…,xσk} 组合不出的成分,所以 γ k ⊥ x σ j , j = 1 , … , k \gamma_k\perp x_{\sigma_j},\quad j=1,\dots,k γk⊥xσj,j=1,…,k。将 (3) 代入 (2) 与 (1) 对比系数: f = ∑ i = 1 k a i k x σ i ⏟ 1 + R k f ⏟ 2 = ∑ i = 1 k a i k + 1 x σ i + a k + 1 k + 1 x σ k + 1 + R k + 1 f = ∑ i = 1 k a i k + 1 x σ i + a k + 1 k + 1 [ ∑ j = 1 k b j k x σ j + γ k ] + R k + 1 f = ∑ i = 1 k [ a i k + 1 + a k + 1 k + 1 ⋅ b i k ] x σ i ⏟ 1 + a k + 1 k + 1 γ k + R k + 1 f ⏟ 2 \begin{aligned} f & =\underbrace{\sum_{i=1}^ka^k_ix_{\sigma_i}}_{1}+\underbrace{R_kf}_{2} \\ &=\sum_{i=1}^{k}a^{k+1}_ix_{\sigma_i}+a^{k+1}_{k+1}x_{\sigma_{k+1}}+R_{k+1}f \\ &=\sum_{i=1}^{k}a^{k+1}_ix_{\sigma_i}+a^{k+1}_{k+1}\left[\sum_{j=1}^kb^k_jx_{\sigma_j}+\gamma_k\right]+R_{k+1}f \\ &=\underbrace{\sum_{i=1}^{k}[a^{k+1}_i+a^{k+1}_{k+1}\cdot b^k_i]x_{\sigma_i}}_{1}+\underbrace{a^{k+1}_{k+1}\gamma_k+R_{k+1}f}_{2} \end{aligned} f=1 i=1∑kaikxσi+2 Rkf=i=1∑kaik+1xσi+ak+1k+1xσk+1+Rk+1f=i=1∑kaik+1xσi+ak+1k+1[j=1∑kbjkxσj+γk]+Rk+1f=1 i=1∑k[aik+1+ak+1k+1⋅bik]xσi+2 ak+1k+1γk+Rk+1f 所以 a i k + 1 = a i k − a k + 1 k + 1 ⋅ b i k R k + 1 f = R k f − a k + 1 k + 1 ⋅ γ k a^{k+1}_i=a^k_i-a^{k+1}_{k+1}\cdot b^k_i \\ R_{k+1}f=R_kf-a^{k+1}_{k+1}\cdot\gamma_k aik+1=aik−ak+1k+1⋅bikRk+1f=Rkf−ak+1k+1⋅γk 只要确定 a k + 1 k + 1 a^{k+1}_{k+1} ak+1k+1 就可以。又由 R k + 1 f ⊥ x σ k + 1 R_{k+1}f\perp x_{\sigma_{k+1}} Rk+1f⊥xσk+1 的限制,有: 0 = < R k + 1 f , x σ k + 1 > = < R k f , x σ k + 1 > − a k + 1 k + 1 ⋅ < γ k , x σ k + 1 > \begin{aligned} 0&=<R_{k+1}f,x_{\sigma_{k+1}}> \\ &=<R_kf,x_{\sigma_{k+1}}>-a^{k+1}_{k+1}\cdot<\gamma_k,x_{\sigma_{k+1}}> \end{aligned} 0=<Rk+1f,xσk+1>=<Rkf,xσk+1>−ak+1k+1⋅<γk,xσk+1> 所以 a k + 1 k + 1 = < R k f , x σ k + 1 > < γ k , x σ k + 1 > a^{k+1}_{k+1}=\frac{<R_kf,x_{\sigma_{k+1}}>}{<\gamma_k,x_{\sigma_{k+1}}>} ak+1k+1=<γk,xσk+1><Rkf,xσk+1>。
这样在保证 R k + 1 f ⊥ x σ k + 1 R_{k+1}f\perp x_{\sigma_{k+1}} Rk+1f⊥xσk+1 的限制的同时,因为 R k f ⊥ x σ j ∧ γ k ⊥ x σ j R_kf\perp x_{\sigma_j}\wedge \gamma_k\perp x_{\sigma_j} Rkf⊥xσj∧γk⊥xσj,所以 R k + 1 f = R k f − γ k ⊥ x σ j , j = 1 , … , k R_{k+1}f=R_kf-\gamma_k\perp x_{\sigma_j},\quad j=1,\dots,k Rk+1f=Rkf−γk⊥xσj,j=1,…,k,所以 R k + 1 f R_{k+1}f Rk+1f 也满足 full backward orthogonality。
Non-negative Orthogonal Matching Pursuit
NOMP [3] 限制组合系数非负,说是可以提高稳定性。
[4] 是一份参考实现。下面是我改写的版本,跟 [4] 对拍过,结果一致。
# nomp.py
import numpy as np
import scipy.io as sio
import scipy.linalg
import scipy.optimize
from sklearn.preprocessing import normalize
"""Non-negative Orthogonal Matching Pursuit
ref:
- Stable and efficient representation learning with nonnegativity constraints
- Supplementary Materials: Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
- https://github.com/hbdat/neurIPS20_CompositionZSL/blob/main/omp/omp.py#L43
"""
def to_pos(X):
"""X -> [max(0, X), max(0, -X)]
把含有负值的向量映射成全是非负的向量
相当于用多一倍的维度代偿原本负值的空间
"""
X_inv = - X
X_pos = np.concatenate([
np.where(X > 0, X, 0),
np.where(X_inv > 0, X_inv, 0),
], axis=1)
print(X_pos.shape)
return X_pos
def cos(A, B=None):
"""cosine"""
An = normalize(A, norm='l2', axis=1)
if (B is None) or (B is A):
return np.dot(An, An.T)
Bn = normalize(B, norm='l2', axis=1)
return np.dot(An, Bn.T)
def spherical_kmeans(X, k=-1):
"""select a subset for dictionary using spherical k-means
X: [n, d]
k: # of cluster
- https://github.com/jasonlaska/spherecluster
"""
from spherecluster import SphericalKMeans
n = X.shape[0]
if (k <= 0) or (k > n):
k = max(1, n // 2)
skm = SphericalKMeans(n_clusters=k)
skm.fit(X)
return skm.cluster_centers_
def norm2(x):
"""x: [d] vector"""
return np.linalg.norm(x) / np.sqrt(len(x))
def NOMP(D, f, k=-1, non_neg=True, max_iter=200, tol_res=1e-3, tol_coef=1e-12, verbose=False):
"""Non-negative Orthogonal Matching Pursuit
Input:
D: [m, d], dictionary (assumed to have been l2-normalised)
f: [d], feature to decompose
k: max # of selected component from dictionary,
default to the whole dictionary size
non_neg: to enforce non-negative coefficients
tol_res: residual tolerance,
exit if ||Rf|| < tol_res * ||f|| (error is small enough)
tol_coef: residual covariance threshold,
exit if coef_i < tol_coef * ||f|| for all i
Output:
active: [n] (n <= k), id of the selected components in the dictionary
coef: [n], composition weights of the selected components
Rf: [d], redidual vector
"""
assert (2 == D.ndim) and (1 == f.ndim)
assert D.shape[1] == f.shape[0], "dimension mismatch"
m = D.shape[0]
if (k <= 0) or (k > m):
k = m
f_norm = norm2(f)
if verbose:
print("initial residual:", f_norm)
# tol_res = tol_res * f_norm
tol_coef = tol_coef * f_norm
Rf = f # (initial) residual vector
D_col = D.T # [d, m], arrange as column vector
active = [] # meta id of the selected components
coef = np.zeros([m], dtype=np.float32)
# degenerate solution
if f_norm < tol_res:
return active, coef
if non_neg:
get_candidate = lambda score: np.argmax(score)
calc_coef = lambda _D, _f: scipy.optimize.nnls(_D, _f)[0]
else:
get_candidate = lambda score: np.argmax(np.abs(score))
calc_coef = lambda _D, _f: scipy.linalg.lstsq(_D, _f)[0]
for _it in range(max_iter):
# print("---", _it, "---")
# neglect the numerator cuz `Dict` is already normalised
score = cos(Rf[np.newaxis, :], D)[0] # [m]
d_id = get_candidate(score)
max_socre = score[d_id]
if max_socre < tol_coef:
break
if d_id not in active:
active.append(d_id)
_coef = calc_coef(D_col[:, active], f)
coef[active] = _coef
f_hat = D_col[:, active].dot(_coef).flatten()
Rf = f - f_hat
err_rate = norm2(Rf) / f_norm
# print("residual error rate:", err_rate)
if err_rate < tol_res:
break
if len(active) >= k:
break
f_hat = D_col[:, active].dot(coef[active]).flatten()
Rf = f - f_hat
err = norm2(Rf)
err_rate = err / f_norm
if verbose:
# print("residual:", np.linalg.norm(res))
print("final residule: {:.4f}, error rate: {:.4f}".format(err, err_rate))
return active, coef, f_hat, Rf

本文详细介绍了非负正交匹配追踪(NOMP)算法,包括其基本原理、步骤以及与正交匹配追踪(OMP)的区别。NOMP通过约束组合系数为非负数来提高算法的稳定性和准确性。文章提供了NOMP的Python实现,并展示了如何逐步逼近输入特征向量的分解。此外,还提到了匹配追踪的其他变种,如标准的OMP和全逆向正交匹配追踪(full backward orthogonality)的概念。
3838

被折叠的 条评论
为什么被折叠?



