SMO算法原理
在前面的算法推导过程中,都遇到了以下的优化问题:
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
κ
(
x
i
,
x
j
)
−
∑
i
=
1
N
α
i
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
⋯
,
N
\begin{aligned} \min_{\boldsymbol{\alpha}} \frac{1}{2} &\sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j \kappa(\mathbf{x}_i,\mathbf{x}_j) - \sum_{i=1}^N \alpha_i \\ s.t. \quad &\sum_{i=1}^N \alpha_i y_i = 0 \\ &0 \le \alpha_i \le C, \quad i=1, 2,\cdots,N \\ \end{aligned}
αmin21s.t.i=1∑Nj=1∑Nαiαjyiyjκ(xi,xj)−i=1∑Nαii=1∑Nαiyi=00≤αi≤C,i=1,2,⋯,N
我们需要求出目标函数极小化对应的参数
N
N
N维向量
α
∗
\boldsymbol{\alpha}^*
α∗。但这个优化式比较复杂,很难直接优化,一般采用启发式方法——SMO算法求解。
SMO算法基本思想
SMO每次只优化两个变量,而将其他变量视为常数。
例如,认为
α
1
\alpha_1
α1和
α
2
\alpha_2
α2是变量,
α
3
,
α
4
,
⋯
,
α
N
\alpha_3,\alpha_4,\cdots,\alpha_N
α3,α4,⋯,αN都是常量,那么常量都可以从目标函数中去除,优化问题变成
min
α
i
,
α
2
1
2
K
11
α
1
2
+
1
2
K
22
α
2
2
+
y
1
y
2
K
12
α
1
α
2
−
(
α
1
+
α
2
)
+
y
1
α
1
∑
i
=
3
N
y
i
α
i
K
i
1
+
y
2
α
2
∑
i
=
3
N
y
i
α
i
K
i
2
s
.
t
.
α
1
y
1
+
α
2
y
2
=
−
∑
i
=
3
N
α
i
y
i
=
ς
0
≤
α
i
≤
C
,
i
=
1
,
2
,
⋯
,
N
\begin{aligned} \min_{\alpha_i,\alpha_2} \quad \frac{1}{2} K_{11} \alpha_1^2 + \frac{1}{2} K_{22} \alpha_2^2 +& y_1 y_2 K_{12} \alpha_1 \alpha_2 - (\alpha_1 + \alpha_2) + y_1 \alpha_1 \sum_{i=3}^N y_i \alpha_i K_{i1} + y_2 \alpha_2 \sum_{i=3}^N y_i \alpha_i K_{i2} \\ s.t. \quad &\alpha_1 y_1 + \alpha_2 y_2 = -\sum_{i=3}^N \alpha_i y_i = \varsigma \\ &0 \le \alpha_i \le C, \quad i=1, 2,\cdots,N \\ \end{aligned}
αi,α2min21K11α12+21K22α22+s.t.y1y2K12α1α2−(α1+α2)+y1α1i=3∑NyiαiKi1+y2α2i=3∑NyiαiKi2α1y1+α2y2=−i=3∑Nαiyi=ς0≤αi≤C,i=1,2,⋯,N
其中
K
i
j
=
κ
(
x
i
,
x
j
)
K_{ij}=\kappa(\mathbf{x}_i,\mathbf{x}_j)
Kij=κ(xi,xj)。由于
y
1
2
=
1
,
y
2
2
=
1
y_1^2 = 1,y_2^2 = 1
y12=1,y22=1,所以目标函数里没有写上。
SMO算法目标函数的优化
首先分析约束条件
α
1
y
1
+
α
2
y
2
=
ς
0
≤
α
i
≤
C
,
i
=
1
,
2
\alpha_1 y_1 + \alpha_2 y_2 = \varsigma \\ 0 \le \alpha_i \le C, \quad i=1, 2
α1y1+α2y2=ς0≤αi≤C,i=1,2
由于
y
1
,
y
2
y_1,y_2
y1,y2的取值只可能为1或-1,那么
α
1
y
1
+
α
2
y
2
=
ς
\alpha_1 y_1 + \alpha_2 y_2 = \varsigma
α1y1+α2y2=ς的形式有4种:
α
1
+
α
2
=
ς
α
1
+
α
2
=
−
ς
α
1
−
α
2
=
ς
α
1
−
α
2
=
−
ς
\begin{aligned} &\alpha_1 + \alpha_2 = \varsigma \\ &\alpha_1 + \alpha_2 = -\varsigma \\ &\alpha_1 - \alpha_2 = \varsigma \\ &\alpha_1 - \alpha_2 = -\varsigma \end{aligned}
α1+α2=ςα1+α2=−ςα1−α2=ςα1−α2=−ς
加上
0
≤
α
1
≤
C
,
0
≤
α
2
≤
C
0 \le \alpha_1 \le C, 0 \le \alpha_2 \le C
0≤α1≤C,0≤α2≤C的限制,使得
α
1
,
α
2
\alpha_1, \alpha_2
α1,α2取值只能在
[
0
,
C
]
×
[
0
,
C
]
[0,C] \times [0,C]
[0,C]×[0,C]的盒子内。
如上图所示,
α
1
,
α
2
\alpha_1, \alpha_2
α1,α2被限制在盒子里的一条线段上,其中一个变量可以被另一个变量表示,所以两个变量的优化问题变成了一个变量的优化问题,不妨考虑为变量
α
2
\alpha_2
α2的最优化问题。
我们采用的是启发式迭代法,假设上一轮迭代的解是 α 1 o l d , α 2 o l d \alpha_1^{old}, \alpha_2^{old} α1old,α2old,不受盒子约束得到的解是 α 2 n e w , u n c \alpha_2^{new,unc} α2new,unc,经过盒子约束裁剪后得到的本轮迭代解是 α 1 n e w , α 2 n e w \alpha_1^{new}, \alpha_2^{new} α1new,α2new。
α
2
n
e
w
\alpha_2^{new}
α2new必须满足上图盒子内的线段约束,假设
L
L
L和
H
H
H分别是上图中
α
2
n
e
w
\alpha_2^{new}
α2new的下边界和上边界,有
L
≤
α
2
n
e
w
≤
H
L \le \alpha_2^{new} \le H
L≤α2new≤H
-
对于 y 1 ≠ y 2 y_1 \ne y_2 y1=y2,若 ς > 0 \varsigma \gt 0 ς>0, 0 ≤ α 2 n e w ≤ C − ς 0 \le \alpha_2^{new} \le C - \varsigma 0≤α2new≤C−ς;若 ς < 0 \varsigma \lt 0 ς<0, − ς ≤ α 2 n e w ≤ C -\varsigma \le \alpha_2^{new} \le C −ς≤α2new≤C。那么
L = max ( 0 , − ς ) , H = min ( C , C − ς ) L = \max(0, -\varsigma), \quad H = \min(C, C - \varsigma) L=max(0,−ς),H=min(C,C−ς)
代入 ς = α 1 o l d − α 2 o l d \varsigma = \alpha_1^{old} - \alpha_2^{old} ς=α1old−α2old有
L = max ( 0 , α 2 o l d − α 1 o l d ) , H = min ( C , C + α 2 o l d − α 1 o l d ) L = \max(0, \alpha_2^{old} - \alpha_1^{old}), \quad H = \min(C, C + \alpha_2^{old} - \alpha_1^{old}) L=max(0,α2old−α1old),H=min(C,C+α2old−α1old) -
对于 y 1 = y 2 y_1 = y_2 y1=y2,若 ς > 0 \varsigma \gt 0 ς>0, ς − C ≤ α 2 n e w ≤ C \varsigma - C \le \alpha_2^{new} \le C ς−C≤α2new≤C;若 ς < 0 \varsigma \lt 0 ς<0, 0 ≤ α 2 n e w ≤ ς 0 \le \alpha_2^{new} \le \varsigma 0≤α2new≤ς。那么
L = max ( 0 , ς − C ) , H = min ( C , ς ) L = \max(0, \varsigma - C), \quad H = \min(C, \varsigma) L=max(0,ς−C),H=min(C,ς)
代入 ς = α 1 o l d + α 2 o l d \varsigma = \alpha_1^{old} + \alpha_2^{old} ς=α1old+α2old有
L = max ( 0 , α 1 o l d + α 2 o l d − C ) , H = min ( C , α 1 o l d + α 2 o l d ) L = \max(0, \alpha_1^{old} + \alpha_2^{old} - C), \quad H = \min(C, \alpha_1^{old} + \alpha_2^{old}) L=max(0,α1old+α2old−C),H=min(C,α1old+α2old)
如果通过求导得到
α
2
n
e
w
,
u
n
c
\alpha_2^{new,unc}
α2new,unc,由下式可以得到
α
2
n
e
w
\alpha_2^{new}
α2new
α
2
n
e
w
=
{
H
,
α
2
n
e
w
,
u
n
c
>
H
α
2
n
e
w
,
u
n
c
,
L
≤
α
2
n
e
w
,
u
n
c
≤
H
L
,
α
2
n
e
w
,
u
n
c
<
L
\alpha_2^{new} = \left\{ \begin{aligned} H &, \quad \alpha_2^{new,unc} \gt H\\ \alpha_2^{new,unc} &, \quad L \le \alpha_2^{new,unc} \le H \\ L &, \quad \alpha_2^{new,unc} \lt L \end{aligned} \right.
α2new=⎩⎪⎨⎪⎧Hα2new,uncL,α2new,unc>H,L≤α2new,unc≤H,α2new,unc<L
那么,如何求解
α
2
n
e
w
,
u
n
c
\alpha_2^{new,unc}
α2new,unc呢?
很简单,只需要将目标函数对 α 2 \alpha_2 α2求偏导即可。
因为
g
(
x
)
=
w
∗
T
ϕ
(
x
)
+
b
∗
=
∑
i
=
1
N
α
i
∗
y
i
κ
(
x
i
,
x
)
+
b
∗
g(\mathbf{x}) = {\mathbf{w}^*}^T \phi(\mathbf{x}) + b^* = \sum_{i=1}^N \alpha_i^* y_i \kappa(\mathbf{x}_i, \mathbf{x}) + b^*
g(x)=w∗Tϕ(x)+b∗=i=1∑Nαi∗yiκ(xi,x)+b∗
为简单叙述,令
v
j
=
∑
i
=
3
N
y
i
α
i
K
i
j
=
∑
i
=
3
N
y
i
α
i
κ
(
x
i
,
x
j
)
=
g
(
x
j
)
−
∑
i
=
1
2
y
i
α
i
κ
(
x
i
,
x
j
)
−
b
=
g
(
x
j
)
−
∑
i
=
1
2
y
i
α
i
K
i
j
−
b
\begin{aligned} v_j &= \sum_{i=3}^N y_i \alpha_i K_{ij} = \sum_{i=3}^N y_i \alpha_i \kappa(\mathbf{x}_i, \mathbf{x}_j) \\ &= g(\mathbf{x}_j) - \sum_{i=1}^2 y_i \alpha_i \kappa(\mathbf{x}_i, \mathbf{x}_j) - b \\ &= g(\mathbf{x}_j) - \sum_{i=1}^2 y_i \alpha_i K_{ij} - b \end{aligned}
vj=i=3∑NyiαiKij=i=3∑Nyiαiκ(xi,xj)=g(xj)−i=1∑2yiαiκ(xi,xj)−b=g(xj)−i=1∑2yiαiKij−b
目标函数简化为
W
(
α
1
,
α
2
)
=
1
2
K
11
α
1
2
+
1
2
K
22
α
2
2
+
y
1
y
2
K
12
α
1
α
2
−
(
α
1
+
α
2
)
+
y
1
α
1
v
1
+
y
2
α
2
v
2
W(\alpha_1, \alpha_2) = \frac{1}{2} K_{11} \alpha_1^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_1 y_2 K_{12} \alpha_1 \alpha_2 - (\alpha_1 + \alpha_2) + y_1 \alpha_1 v_1 + y_2 \alpha_2 v_2
W(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2−(α1+α2)+y1α1v1+y2α2v2
由于
α
1
y
1
+
α
2
y
2
=
ς
\alpha_1 y_1 + \alpha_2 y_2 = \varsigma
α1y1+α2y2=ς,且
y
1
,
y
2
∈
{
1
,
−
1
}
y_1,y_2 \in \{1,-1\}
y1,y2∈{1,−1},可以得到
α
1
=
y
1
(
ς
−
α
2
y
2
)
\alpha_1 = y_1 (\varsigma - \alpha_2 y_2)
α1=y1(ς−α2y2)
代入目标函数消除
α
2
\alpha_2
α2,
W
(
α
2
)
=
1
2
y
1
2
K
11
(
ς
−
α
2
y
2
)
2
+
1
2
K
22
α
2
2
+
y
1
2
y
2
K
12
(
ς
−
α
2
y
2
)
α
2
−
y
1
(
ς
−
α
2
y
2
)
−
α
2
+
y
1
2
(
ς
−
α
2
y
2
)
v
1
+
y
2
α
2
v
2
=
1
2
K
11
(
ς
−
α
2
y
2
)
2
+
1
2
K
22
α
2
2
+
y
2
K
12
(
ς
−
α
2
y
2
)
α
2
−
y
1
(
ς
−
α
2
y
2
)
−
α
2
+
(
ς
−
α
2
y
2
)
v
1
+
y
2
α
2
v
2
=
1
2
K
11
(
ς
−
α
2
y
2
)
2
+
1
2
K
22
α
2
2
+
y
2
K
12
(
ς
−
α
2
y
2
)
α
2
−
y
1
(
ς
−
α
2
y
2
)
−
α
2
+
(
ς
−
α
2
y
2
)
v
1
+
y
2
α
2
v
2
\begin{aligned} W(\alpha_2) =& \frac{1}{2} y_1^2 K_{11} (\varsigma - \alpha_2 y_2)^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_1^2 y_2 K_{12} (\varsigma - \alpha_2 y_2) \alpha_2 \\ &- y_1 (\varsigma - \alpha_2 y_2) - \alpha_2 + y_1^2 (\varsigma - \alpha_2 y_2) v_1 + y_2 \alpha_2 v_2 \\ =& \frac{1}{2} K_{11} (\varsigma - \alpha_2 y_2)^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_2 K_{12} (\varsigma - \alpha_2 y_2) \alpha_2 \\ &- y_1 (\varsigma - \alpha_2 y_2) - \alpha_2 + (\varsigma - \alpha_2 y_2) v_1 + y_2 \alpha_2 v_2 \\ =& \frac{1}{2} K_{11} (\varsigma - \alpha_2 y_2)^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_2 K_{12} (\varsigma - \alpha_2 y_2) \alpha_2 \\ &- y_1 (\varsigma - \alpha_2 y_2) - \alpha_2 + (\varsigma - \alpha_2 y_2) v_1 + y_2 \alpha_2 v_2 \end{aligned}
W(α2)===21y12K11(ς−α2y2)2+21K22α22+y12y2K12(ς−α2y2)α2−y1(ς−α2y2)−α2+y12(ς−α2y2)v1+y2α2v221K11(ς−α2y2)2+21K22α22+y2K12(ς−α2y2)α2−y1(ς−α2y2)−α2+(ς−α2y2)v1+y2α2v221K11(ς−α2y2)2+21K22α22+y2K12(ς−α2y2)α2−y1(ς−α2y2)−α2+(ς−α2y2)v1+y2α2v2
目标函数对
α
2
\alpha_2
α2求偏导,
∂
W
∂
α
2
=
K
11
α
2
+
K
22
α
2
−
2
K
12
α
2
−
y
2
K
11
ς
+
y
2
K
12
ς
+
y
1
y
2
−
1
−
y
2
v
1
+
y
2
v
2
=
0
\frac{\partial W}{\partial \alpha_2} = K_{11} \alpha_2 + K_{22} \alpha_2 - 2 K_{12} \alpha_2 - y_2 K_{11} \varsigma + y_2 K_{12} \varsigma + y_1 y_2 - 1 - y_2 v_1 + y_2 v_2 = 0
∂α2∂W=K11α2+K22α2−2K12α2−y2K11ς+y2K12ς+y1y2−1−y2v1+y2v2=0
整理得:
(
K
11
+
K
22
−
2
K
12
)
α
2
=
y
2
K
11
ς
−
y
2
K
12
ς
−
y
1
y
2
+
1
+
y
2
v
1
−
y
2
v
2
=
y
2
K
11
ς
−
y
2
K
12
ς
−
y
1
y
2
+
y
2
2
+
y
2
v
1
−
y
2
v
2
=
y
2
(
K
11
ς
−
K
12
ς
−
y
1
+
y
2
+
v
1
−
v
2
)
=
y
2
{
K
11
ς
−
K
12
ς
−
y
1
+
y
2
+
[
g
(
x
1
)
−
∑
i
=
1
2
y
i
α
i
K
i
1
−
b
]
−
[
g
(
x
2
)
−
∑
i
=
1
2
y
i
α
i
K
i
2
−
b
]
}
=
y
2
[
(
K
11
−
K
12
)
ς
−
y
1
+
y
2
+
g
(
x
1
)
−
g
(
x
2
)
−
∑
i
=
1
2
y
i
α
i
K
i
1
+
∑
i
=
1
2
y
i
α
i
K
i
2
]
\begin{aligned} &(K_{11} + K_{22} - 2 K_{12}) \alpha_2 \\ =& y_2 K_{11} \varsigma - y_2 K_{12} \varsigma - y_1 y_2 + 1 + y_2 v_1 - y_2 v_2 \\ =& y_2 K_{11} \varsigma - y_2 K_{12} \varsigma - y_1 y_2 + y_2^2 + y_2 v_1 - y_2 v_2 \\ =& y_2 (K_{11} \varsigma - K_{12} \varsigma - y_1 + y_2 + v_1 - v_2) \\ =& y_2 \{K_{11} \varsigma - K_{12} \varsigma - y_1 + y_2 + [g(\mathbf{x}_1) - \sum_{i=1}^2 y_i \alpha_i K_{i1} - b] - [g(\mathbf{x}_2) - \sum_{i=1}^2 y_i \alpha_i K_{i2} - b]\} \\ =& y_2 [(K_{11} - K_{12}) \varsigma - y_1 + y_2 + g(\mathbf{x}_1) - g(\mathbf{x}_2) - \sum_{i=1}^2 y_i \alpha_i K_{i1} + \sum_{i=1}^2 y_i \alpha_i K_{i2}] \end{aligned}
=====(K11+K22−2K12)α2y2K11ς−y2K12ς−y1y2+1+y2v1−y2v2y2K11ς−y2K12ς−y1y2+y22+y2v1−y2v2y2(K11ς−K12ς−y1+y2+v1−v2)y2{K11ς−K12ς−y1+y2+[g(x1)−i=1∑2yiαiKi1−b]−[g(x2)−i=1∑2yiαiKi2−b]}y2[(K11−K12)ς−y1+y2+g(x1)−g(x2)−i=1∑2yiαiKi1+i=1∑2yiαiKi2]
将
ς
=
α
1
y
1
+
α
2
y
2
\varsigma = \alpha_1 y_1 + \alpha_2 y_2
ς=α1y1+α2y2代入上式有
(
K
11
+
K
22
−
2
K
12
)
α
2
n
e
w
,
u
n
c
=
y
2
[
(
K
11
−
K
12
)
(
α
1
o
l
d
y
1
+
α
2
o
l
d
y
2
)
−
y
1
+
y
2
+
g
(
x
1
)
−
g
(
x
2
)
−
∑
i
=
1
2
y
i
α
i
o
l
d
K
i
1
+
∑
i
=
1
2
y
i
α
i
o
l
d
K
i
2
]
=
y
2
{
y
2
(
K
11
+
K
22
−
2
K
12
)
α
2
o
l
d
+
[
g
(
x
1
)
−
y
1
]
−
[
g
(
x
2
)
−
y
2
]
}
=
(
K
11
+
K
22
−
2
K
12
)
α
2
o
l
d
+
y
2
(
E
1
−
E
2
)
\begin{aligned} &(K_{11} + K_{22} - 2 K_{12}) \alpha_2^{new,unc} \\ =& y_2 [(K_{11} - K_{12}) (\alpha_1^{old} y_1 + \alpha_2^{old} y_2) - y_1 + y_2 + g(\mathbf{x}_1) - g(\mathbf{x}_2) - \sum_{i=1}^2 y_i \alpha_i^{old} K_{i1} + \sum_{i=1}^2 y_i \alpha_i^{old} K_{i2}] \\ =& y_2 \{y_2 (K_{11} + K_{22} - 2K_{12}) \alpha_2^{old} + [g(\mathbf{x}_1) - y_1] - [g(\mathbf{x}_2) - y_2]\} \\ =& (K_{11} + K_{22} - 2K_{12}) \alpha_2^{old} + y_2 (E_1 - E_2) \end{aligned}
===(K11+K22−2K12)α2new,uncy2[(K11−K12)(α1oldy1+α2oldy2)−y1+y2+g(x1)−g(x2)−i=1∑2yiαioldKi1+i=1∑2yiαioldKi2]y2{y2(K11+K22−2K12)α2old+[g(x1)−y1]−[g(x2)−y2]}(K11+K22−2K12)α2old+y2(E1−E2)
其中,
E
i
=
g
(
x
i
)
−
y
i
,
i
=
1
,
2
E_i = g(\mathbf{x}_i) - y_i, \quad i=1,2
Ei=g(xi)−yi,i=1,2。
最终得到
α
2
n
e
w
,
u
n
c
\alpha_2^{new,unc}
α2new,unc的表达式:
α
2
n
e
w
,
u
n
c
=
α
2
o
l
d
+
y
2
(
E
1
−
E
2
)
K
11
+
K
22
−
2
K
12
\begin{aligned} \alpha_2^{new,unc} = \alpha_2^{old} + \frac{y_2 (E_1 - E_2)}{K_{11} + K_{22} - 2 K_{12}} \end{aligned}
α2new,unc=α2old+K11+K22−2K12y2(E1−E2)
利用上面讲到的关系式
α
2
n
e
w
=
{
H
,
α
2
n
e
w
,
u
n
c
>
H
α
2
n
e
w
,
u
n
c
,
L
≤
α
2
n
e
w
,
u
n
c
≤
H
L
,
α
2
n
e
w
,
u
n
c
<
L
\alpha_2^{new} = \left\{ \begin{aligned} H &, \quad \alpha_2^{new,unc} \gt H\\ \alpha_2^{new,unc} &, \quad L \le \alpha_2^{new,unc} \le H \\ L &, \quad \alpha_2^{new,unc} \lt L \end{aligned} \right.
α2new=⎩⎪⎨⎪⎧Hα2new,uncL,α2new,unc>H,L≤α2new,unc≤H,α2new,unc<L
就能得到
α
2
n
e
w
\alpha_2^{new}
α2new,进而求得
α
1
n
e
w
=
y
1
(
ς
−
α
2
n
e
w
y
2
)
\alpha_1^{new} = y_1 (\varsigma - \alpha_2^{new} y_2)
α1new=y1(ς−α2newy2)。
SMO算法两个变量的选择
SMO算法需要选择合适的两个变量优化迭代,其余变量看作是常数,那么如何选择这两个变量呢?
第一个变量的选择
SMO算法称选择第一个变量的过程为外层循环,这个变量需要选择在训练集中违反KKT条件最严重的样本点。
对于样本点,要满足的KKT条件是:
α
i
∗
=
0
⇒
y
i
g
(
x
i
)
≥
1
0
<
α
i
∗
<
C
⇒
y
i
g
(
x
i
)
=
1
α
i
∗
=
C
⇒
y
i
g
(
x
i
)
≤
1
\alpha_i^* = 0 \Rightarrow y_i g(\mathbf{x}_i) \ge 1 \\ 0 \lt \alpha_i^* < C \Rightarrow y_i g(\mathbf{x}_i) = 1 \\ \alpha_i^* = C \Rightarrow y_i g(\mathbf{x}_i) \le 1
αi∗=0⇒yig(xi)≥10<αi∗<C⇒yig(xi)=1αi∗=C⇒yig(xi)≤1
一般来说,我们首选违反
0
<
α
i
∗
<
C
⇒
y
i
g
(
x
i
)
0 \lt \alpha_i^* \lt C \Rightarrow y_i g(\mathbf{x}_i)
0<αi∗<C⇒yig(xi)这个条件的点,其次是违反
α
i
∗
=
0
⇒
y
i
g
(
x
i
)
≥
1
\alpha_i^* = 0 \Rightarrow y_i g(\mathbf{x}_i) \ge 1
αi∗=0⇒yig(xi)≥1和
α
i
∗
=
C
⇒
y
i
g
(
x
i
)
≤
1
\alpha_i^* = C \Rightarrow y_i g(\mathbf{x}_i) \le 1
αi∗=C⇒yig(xi)≤1的点。
第二个变量的选择
SMO算法称选择第二个变量迭代为内层循环,假设我们在外层循环已经找到了 α 1 \alpha_1 α1,第二个变量 α 2 \alpha_2 α2的选择标准是让 ∣ E 1 − E 2 ∣ |E_1-E_2| ∣E1−E2∣有足够大的变化。这是因为确定了 α 1 \alpha_1 α1,就能确定 E 1 E_1 E1,所以要想 ∣ E 1 − E 2 ∣ |E_1-E_2| ∣E1−E2∣最大,只需要在 E 1 E_1 E1为正时,选择最小的 E i E_i Ei作为 E 2 E_2 E2,在 E 1 E_1 E1为负时,选择最大的 E i E_i Ei作为 E 2 E_2 E2,可以将所有的 E i E_i Ei保存下来加快迭代。
如果内存循环找到的点不能让目标函数有足够的下降,可以采用便利支持向量点来做 α 2 \alpha_2 α2,知道目标函数有足够的下降,如果所有的支持向量做 α 2 \alpha_2 α2都不能让目标函数有足够的下降,可以跳出循环,重新选择 α 1 \alpha_1 α1。
计算截距 b b b和差值 E i E_i Ei
在每次完成两个变量的优化后,需要重新计算截距
b
b
b。当
0
<
α
1
n
e
w
<
C
0 \lt \alpha_1^{new} \lt C
0<α1new<C时,由KKT条件可知:
∑
i
=
1
N
α
i
y
i
K
i
1
+
b
=
y
1
\sum_{i=1}^N \alpha_i y_i K_{i1} + b = y_1
i=1∑NαiyiKi1+b=y1
于是新的
b
1
n
e
w
b_1^{new}
b1new为:
b
1
n
e
w
=
y
1
−
∑
i
=
3
N
α
i
y
i
K
i
1
−
α
1
n
e
w
y
1
K
11
−
α
2
n
e
w
y
2
K
21
b_1^{new} = y_1 - \sum_{i=3}^N \alpha_i y_i K_{i1} - \alpha_1^{new} y_1 K_{11} - \alpha_2^{new} y_2 K_{21}
b1new=y1−i=3∑NαiyiKi1−α1newy1K11−α2newy2K21
由于
E
1
=
g
(
x
1
)
−
y
1
=
∑
i
=
3
N
α
i
y
i
K
i
1
+
α
1
o
l
d
y
1
K
11
+
α
2
o
l
d
y
2
K
21
+
b
o
l
d
−
y
1
⇒
y
1
−
∑
i
=
3
N
α
i
y
i
K
i
1
=
−
E
1
+
α
1
o
l
d
y
1
K
11
+
α
2
o
l
d
y
2
K
21
+
b
o
l
d
\begin{aligned} &E_1 = g(\mathbf{x}_1) - y_1 = \sum_{i=3}^N \alpha_i y_i K_{i1} + \alpha_1^{old} y_1 K_{11} + \alpha_2^{old} y_2 K_{21} + b^{old} - y_1 \\ \Rightarrow&y_1 - \sum_{i=3}^N \alpha_i y_i K_{i1} = - E_1 + \alpha_1^{old} y_1 K_{11} + \alpha_2^{old} y_2 K_{21} + b^{old} \end{aligned}
⇒E1=g(x1)−y1=i=3∑NαiyiKi1+α1oldy1K11+α2oldy2K21+bold−y1y1−i=3∑NαiyiKi1=−E1+α1oldy1K11+α2oldy2K21+bold
那么
b
1
n
e
w
=
−
E
1
−
y
1
K
11
(
α
1
n
e
w
−
α
1
o
l
d
−
y
2
K
21
(
α
2
n
e
w
−
α
2
o
l
d
)
+
b
o
l
d
)
b_1^{new} = - E_1 - y_1 K_{11} (\alpha_1^{new} - \alpha_1^{old} - y_2 K_{21} (\alpha_2^{new} - \alpha_2^{old}) + b^{old})
b1new=−E1−y1K11(α1new−α1old−y2K21(α2new−α2old)+bold)
同样的,如果
0
<
α
2
n
e
w
<
C
0 \lt \alpha_2^{new} \lt C
0<α2new<C,那么有
b
2
n
e
w
=
−
E
2
−
y
1
K
12
(
α
1
n
e
w
−
α
1
o
l
d
)
−
y
2
K
22
(
α
2
n
e
w
−
α
2
o
l
d
)
+
b
o
l
d
b_2^{new} = - E_2 - y_1 K_{12} (\alpha_1^{new} - \alpha_1^{old}) - y_2 K_{22} (\alpha_2^{new} - \alpha_2^{old}) + b^{old}
b2new=−E2−y1K12(α1new−α1old)−y2K22(α2new−α2old)+bold
最终的
b
n
e
w
b^{new}
bnew为:
b
n
e
w
=
b
1
n
e
w
+
b
2
n
e
w
2
b^{new} = \frac{b_1^{new} + b_2^{new}}{2}
bnew=2b1new+b2new
得到
b
n
e
w
b^{new}
bnew后,我们更新
E
i
E_i
Ei:
E
i
=
∑
S
y
i
α
i
κ
(
x
i
,
x
j
)
+
b
n
e
w
−
y
i
E_i = \sum_{S} y_i \alpha_i \kappa(\mathbf{x}_i, \mathbf{x}_j) + b^{new} - y_i
Ei=S∑yiαiκ(xi,xj)+bnew−yi
其中,
S
S
S是所有支持向量
x
j
\mathbf{x}_j
xj的集合。
SMO算法总结
输入:线性可分的 N N N个样本 ( x i , y i ) (\mathbf{x}_i,y_i) (xi,yi), i = 1 , 2 , ⋯ , N i=1,2,\cdots,N i=1,2,⋯,N, x i \mathbf{x}_i xi是 m m m维特征向量, y i ∈ { 1 , − 1 } y_i \in \{1,-1\} yi∈{1,−1}是标签(label),精度 e e e。
输出:近似解 α \alpha α。
-
取初值 α 0 = 0 , k = 0 \alpha^0 = 0, k = 0 α0=0,k=0;
-
选择 α 1 k \alpha_1^k α1k和 α 2 k \alpha_2^k α2k,求出新的 α 2 n e w , u n c \alpha_2^{new,unc} α2new,unc。
α 2 n e w , u n c = α 2 k + y 2 ( E 1 − E 2 ) K 11 + K 22 − 2 K 12 \begin{aligned} \alpha_2^{new,unc} = \alpha_2^k + \frac{y_2 (E_1 - E_2)}{K_{11} + K_{22} - 2 K_{12}} \end{aligned} α2new,unc=α2k+K11+K22−2K12y2(E1−E2) -
按照下式求出 α 2 k + 1 \alpha_2^{k+1} α2k+1,
α 2 k + 1 = { H , α 2 n e w , u n c > H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c < L \alpha_2^{k+1} = \left\{ \begin{aligned} H &, \quad \alpha_2^{new,unc} \gt H\\ \alpha_2^{new,unc} &, \quad L \le \alpha_2^{new,unc} \le H \\ L &, \quad \alpha_2^{new,unc} \lt L \end{aligned} \right. α2k+1=⎩⎪⎨⎪⎧Hα2new,uncL,α2new,unc>H,L≤α2new,unc≤H,α2new,unc<L -
求出 α 1 k + 1 = y 1 ( ς − α 2 k + 1 y 2 ) \alpha_1^{k+1} = y_1 (\varsigma - \alpha_2^{k+1} y_2) α1k+1=y1(ς−α2k+1y2);
-
求出 b k + 1 b^{k+1} bk+1和 E i E_i Ei;
-
在精度 e e e范围内检查是否满足如下的终止条件:
∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯ , N α i k + 1 = 0 ⇒ y i g ( x i ) ≥ 1 0 < α i k + 1 < C ⇒ y i g ( x i ) = 1 α i k + 1 = C ⇒ y i g ( x i ) ≤ 1 \sum_{i=1}^N \alpha_i y_i = 0 \\ 0 \le \alpha_i \le C, \quad i=1,2,\cdots,N \\ \alpha_i^{k+1} = 0 \Rightarrow y_i g(\mathbf{x}_i) \ge 1 \\ 0 \lt \alpha_i^{k+1} < C \Rightarrow y_i g(\mathbf{x}_i) = 1 \\ \alpha_i^{k+1} = C \Rightarrow y_i g(\mathbf{x}_i) \le 1 i=1∑Nαiyi=00≤αi≤C,i=1,2,⋯,Nαik+1=0⇒yig(xi)≥10<αik+1<C⇒yig(xi)=1αik+1=C⇒yig(xi)≤1 -
如果满足则结束,返回 α k + 1 \boldsymbol{\alpha}^{k+1} αk+1,否则转到步骤2。