SMO算法原理

SMO算法原理

在前面的算法推导过程中,都遇到了以下的优化问题:
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j κ ( x i , x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \begin{aligned} \min_{\boldsymbol{\alpha}} \frac{1}{2} &\sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j \kappa(\mathbf{x}_i,\mathbf{x}_j) - \sum_{i=1}^N \alpha_i \\ s.t. \quad &\sum_{i=1}^N \alpha_i y_i = 0 \\ &0 \le \alpha_i \le C, \quad i=1, 2,\cdots,N \\ \end{aligned} αmin21s.t.i=1Nj=1Nαiαjyiyjκ(xi,xj)i=1Nαii=1Nαiyi=00αiC,i=1,2,,N
我们需要求出目标函数极小化对应的参数 N N N维向量 α ∗ \boldsymbol{\alpha}^* α。但这个优化式比较复杂,很难直接优化,一般采用启发式方法——SMO算法求解。

SMO算法基本思想

SMO每次只优化两个变量,而将其他变量视为常数。

例如,认为 α 1 \alpha_1 α1 α 2 \alpha_2 α2是变量, α 3 , α 4 , ⋯   , α N \alpha_3,\alpha_4,\cdots,\alpha_N α3,α4,,αN都是常量,那么常量都可以从目标函数中去除,优化问题变成
min ⁡ α i , α 2 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 ∑ i = 3 N y i α i K i 1 + y 2 α 2 ∑ i = 3 N y i α i K i 2 s . t . α 1 y 1 + α 2 y 2 = − ∑ i = 3 N α i y i = ς 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \begin{aligned} \min_{\alpha_i,\alpha_2} \quad \frac{1}{2} K_{11} \alpha_1^2 + \frac{1}{2} K_{22} \alpha_2^2 +& y_1 y_2 K_{12} \alpha_1 \alpha_2 - (\alpha_1 + \alpha_2) + y_1 \alpha_1 \sum_{i=3}^N y_i \alpha_i K_{i1} + y_2 \alpha_2 \sum_{i=3}^N y_i \alpha_i K_{i2} \\ s.t. \quad &\alpha_1 y_1 + \alpha_2 y_2 = -\sum_{i=3}^N \alpha_i y_i = \varsigma \\ &0 \le \alpha_i \le C, \quad i=1, 2,\cdots,N \\ \end{aligned} αi,α2min21K11α12+21K22α22+s.t.y1y2K12α1α2(α1+α2)+y1α1i=3NyiαiKi1+y2α2i=3NyiαiKi2α1y1+α2y2=i=3Nαiyi=ς0αiC,i=1,2,,N
其中 K i j = κ ( x i , x j ) K_{ij}=\kappa(\mathbf{x}_i,\mathbf{x}_j) Kij=κ(xi,xj)。由于 y 1 2 = 1 , y 2 2 = 1 y_1^2 = 1,y_2^2 = 1 y12=1,y22=1,所以目标函数里没有写上。

SMO算法目标函数的优化

首先分析约束条件
α 1 y 1 + α 2 y 2 = ς 0 ≤ α i ≤ C , i = 1 , 2 \alpha_1 y_1 + \alpha_2 y_2 = \varsigma \\ 0 \le \alpha_i \le C, \quad i=1, 2 α1y1+α2y2=ς0αiC,i=1,2
由于 y 1 , y 2 y_1,y_2 y1,y2的取值只可能为1或-1,那么 α 1 y 1 + α 2 y 2 = ς \alpha_1 y_1 + \alpha_2 y_2 = \varsigma α1y1+α2y2=ς的形式有4种:
α 1 + α 2 = ς α 1 + α 2 = − ς α 1 − α 2 = ς α 1 − α 2 = − ς \begin{aligned} &\alpha_1 + \alpha_2 = \varsigma \\ &\alpha_1 + \alpha_2 = -\varsigma \\ &\alpha_1 - \alpha_2 = \varsigma \\ &\alpha_1 - \alpha_2 = -\varsigma \end{aligned} α1+α2=ςα1+α2=ςα1α2=ςα1α2=ς
加上 0 ≤ α 1 ≤ C , 0 ≤ α 2 ≤ C 0 \le \alpha_1 \le C, 0 \le \alpha_2 \le C 0α1C,0α2C的限制,使得 α 1 , α 2 \alpha_1, \alpha_2 α1,α2取值只能在 [ 0 , C ] × [ 0 , C ] [0,C] \times [0,C] [0,C]×[0,C]的盒子内。

在这里插入图片描述
如上图所示, α 1 , α 2 \alpha_1, \alpha_2 α1,α2被限制在盒子里的一条线段上,其中一个变量可以被另一个变量表示,所以两个变量的优化问题变成了一个变量的优化问题,不妨考虑为变量 α 2 \alpha_2 α2的最优化问题。

我们采用的是启发式迭代法,假设上一轮迭代的解是 α 1 o l d , α 2 o l d \alpha_1^{old}, \alpha_2^{old} α1old,α2old,不受盒子约束得到的解是 α 2 n e w , u n c \alpha_2^{new,unc} α2new,unc,经过盒子约束裁剪后得到的本轮迭代解是 α 1 n e w , α 2 n e w \alpha_1^{new}, \alpha_2^{new} α1new,α2new

α 2 n e w \alpha_2^{new} α2new必须满足上图盒子内的线段约束,假设 L L L H H H分别是上图中 α 2 n e w \alpha_2^{new} α2new的下边界和上边界,有
L ≤ α 2 n e w ≤ H L \le \alpha_2^{new} \le H Lα2newH

  • 对于 y 1 ≠ y 2 y_1 \ne y_2 y1=y2,若 ς > 0 \varsigma \gt 0 ς>0 0 ≤ α 2 n e w ≤ C − ς 0 \le \alpha_2^{new} \le C - \varsigma 0α2newCς;若 ς < 0 \varsigma \lt 0 ς<0 − ς ≤ α 2 n e w ≤ C -\varsigma \le \alpha_2^{new} \le C ςα2newC。那么
    L = max ⁡ ( 0 , − ς ) , H = min ⁡ ( C , C − ς ) L = \max(0, -\varsigma), \quad H = \min(C, C - \varsigma) L=max(0,ς),H=min(C,Cς)
    代入 ς = α 1 o l d − α 2 o l d \varsigma = \alpha_1^{old} - \alpha_2^{old} ς=α1oldα2old
    L = max ⁡ ( 0 , α 2 o l d − α 1 o l d ) , H = min ⁡ ( C , C + α 2 o l d − α 1 o l d ) L = \max(0, \alpha_2^{old} - \alpha_1^{old}), \quad H = \min(C, C + \alpha_2^{old} - \alpha_1^{old}) L=max(0,α2oldα1old),H=min(C,C+α2oldα1old)

  • 对于 y 1 = y 2 y_1 = y_2 y1=y2,若 ς > 0 \varsigma \gt 0 ς>0 ς − C ≤ α 2 n e w ≤ C \varsigma - C \le \alpha_2^{new} \le C ςCα2newC;若 ς < 0 \varsigma \lt 0 ς<0 0 ≤ α 2 n e w ≤ ς 0 \le \alpha_2^{new} \le \varsigma 0α2newς。那么
    L = max ⁡ ( 0 , ς − C ) , H = min ⁡ ( C , ς ) L = \max(0, \varsigma - C), \quad H = \min(C, \varsigma) L=max(0,ςC),H=min(C,ς)
    代入 ς = α 1 o l d + α 2 o l d \varsigma = \alpha_1^{old} + \alpha_2^{old} ς=α1old+α2old
    L = max ⁡ ( 0 , α 1 o l d + α 2 o l d − C ) , H = min ⁡ ( C , α 1 o l d + α 2 o l d ) L = \max(0, \alpha_1^{old} + \alpha_2^{old} - C), \quad H = \min(C, \alpha_1^{old} + \alpha_2^{old}) L=max(0,α1old+α2oldC),H=min(C,α1old+α2old)

如果通过求导得到 α 2 n e w , u n c \alpha_2^{new,unc} α2new,unc,由下式可以得到 α 2 n e w \alpha_2^{new} α2new
α 2 n e w = { H , α 2 n e w , u n c > H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c < L \alpha_2^{new} = \left\{ \begin{aligned} H &, \quad \alpha_2^{new,unc} \gt H\\ \alpha_2^{new,unc} &, \quad L \le \alpha_2^{new,unc} \le H \\ L &, \quad \alpha_2^{new,unc} \lt L \end{aligned} \right. α2new=Hα2new,uncL,α2new,unc>H,Lα2new,uncH,α2new,unc<L
那么,如何求解 α 2 n e w , u n c \alpha_2^{new,unc} α2new,unc呢?

很简单,只需要将目标函数对 α 2 \alpha_2 α2求偏导即可。

因为
g ( x ) = w ∗ T ϕ ( x ) + b ∗ = ∑ i = 1 N α i ∗ y i κ ( x i , x ) + b ∗ g(\mathbf{x}) = {\mathbf{w}^*}^T \phi(\mathbf{x}) + b^* = \sum_{i=1}^N \alpha_i^* y_i \kappa(\mathbf{x}_i, \mathbf{x}) + b^* g(x)=wTϕ(x)+b=i=1Nαiyiκ(xi,x)+b
为简单叙述,令
v j = ∑ i = 3 N y i α i K i j = ∑ i = 3 N y i α i κ ( x i , x j ) = g ( x j ) − ∑ i = 1 2 y i α i κ ( x i , x j ) − b = g ( x j ) − ∑ i = 1 2 y i α i K i j − b \begin{aligned} v_j &= \sum_{i=3}^N y_i \alpha_i K_{ij} = \sum_{i=3}^N y_i \alpha_i \kappa(\mathbf{x}_i, \mathbf{x}_j) \\ &= g(\mathbf{x}_j) - \sum_{i=1}^2 y_i \alpha_i \kappa(\mathbf{x}_i, \mathbf{x}_j) - b \\ &= g(\mathbf{x}_j) - \sum_{i=1}^2 y_i \alpha_i K_{ij} - b \end{aligned} vj=i=3NyiαiKij=i=3Nyiαiκ(xi,xj)=g(xj)i=12yiαiκ(xi,xj)b=g(xj)i=12yiαiKijb
目标函数简化为
W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 v 1 + y 2 α 2 v 2 W(\alpha_1, \alpha_2) = \frac{1}{2} K_{11} \alpha_1^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_1 y_2 K_{12} \alpha_1 \alpha_2 - (\alpha_1 + \alpha_2) + y_1 \alpha_1 v_1 + y_2 \alpha_2 v_2 W(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2(α1+α2)+y1α1v1+y2α2v2
由于 α 1 y 1 + α 2 y 2 = ς \alpha_1 y_1 + \alpha_2 y_2 = \varsigma α1y1+α2y2=ς,且 y 1 , y 2 ∈ { 1 , − 1 } y_1,y_2 \in \{1,-1\} y1,y2{1,1},可以得到
α 1 = y 1 ( ς − α 2 y 2 ) \alpha_1 = y_1 (\varsigma - \alpha_2 y_2) α1=y1(ςα2y2)
代入目标函数消除 α 2 \alpha_2 α2
W ( α 2 ) = 1 2 y 1 2 K 11 ( ς − α 2 y 2 ) 2 + 1 2 K 22 α 2 2 + y 1 2 y 2 K 12 ( ς − α 2 y 2 ) α 2 − y 1 ( ς − α 2 y 2 ) − α 2 + y 1 2 ( ς − α 2 y 2 ) v 1 + y 2 α 2 v 2 = 1 2 K 11 ( ς − α 2 y 2 ) 2 + 1 2 K 22 α 2 2 + y 2 K 12 ( ς − α 2 y 2 ) α 2 − y 1 ( ς − α 2 y 2 ) − α 2 + ( ς − α 2 y 2 ) v 1 + y 2 α 2 v 2 = 1 2 K 11 ( ς − α 2 y 2 ) 2 + 1 2 K 22 α 2 2 + y 2 K 12 ( ς − α 2 y 2 ) α 2 − y 1 ( ς − α 2 y 2 ) − α 2 + ( ς − α 2 y 2 ) v 1 + y 2 α 2 v 2 \begin{aligned} W(\alpha_2) =& \frac{1}{2} y_1^2 K_{11} (\varsigma - \alpha_2 y_2)^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_1^2 y_2 K_{12} (\varsigma - \alpha_2 y_2) \alpha_2 \\ &- y_1 (\varsigma - \alpha_2 y_2) - \alpha_2 + y_1^2 (\varsigma - \alpha_2 y_2) v_1 + y_2 \alpha_2 v_2 \\ =& \frac{1}{2} K_{11} (\varsigma - \alpha_2 y_2)^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_2 K_{12} (\varsigma - \alpha_2 y_2) \alpha_2 \\ &- y_1 (\varsigma - \alpha_2 y_2) - \alpha_2 + (\varsigma - \alpha_2 y_2) v_1 + y_2 \alpha_2 v_2 \\ =& \frac{1}{2} K_{11} (\varsigma - \alpha_2 y_2)^2 + \frac{1}{2} K_{22} \alpha_2^2 + y_2 K_{12} (\varsigma - \alpha_2 y_2) \alpha_2 \\ &- y_1 (\varsigma - \alpha_2 y_2) - \alpha_2 + (\varsigma - \alpha_2 y_2) v_1 + y_2 \alpha_2 v_2 \end{aligned} W(α2)===21y12K11(ςα2y2)2+21K22α22+y12y2K12(ςα2y2)α2y1(ςα2y2)α2+y12(ςα2y2)v1+y2α2v221K11(ςα2y2)2+21K22α22+y2K12(ςα2y2)α2y1(ςα2y2)α2+(ςα2y2)v1+y2α2v221K11(ςα2y2)2+21K22α22+y2K12(ςα2y2)α2y1(ςα2y2)α2+(ςα2y2)v1+y2α2v2
目标函数对 α 2 \alpha_2 α2求偏导,
∂ W ∂ α 2 = K 11 α 2 + K 22 α 2 − 2 K 12 α 2 − y 2 K 11 ς + y 2 K 12 ς + y 1 y 2 − 1 − y 2 v 1 + y 2 v 2 = 0 \frac{\partial W}{\partial \alpha_2} = K_{11} \alpha_2 + K_{22} \alpha_2 - 2 K_{12} \alpha_2 - y_2 K_{11} \varsigma + y_2 K_{12} \varsigma + y_1 y_2 - 1 - y_2 v_1 + y_2 v_2 = 0 α2W=K11α2+K22α22K12α2y2K11ς+y2K12ς+y1y21y2v1+y2v2=0
整理得:
( K 11 + K 22 − 2 K 12 ) α 2 = y 2 K 11 ς − y 2 K 12 ς − y 1 y 2 + 1 + y 2 v 1 − y 2 v 2 = y 2 K 11 ς − y 2 K 12 ς − y 1 y 2 + y 2 2 + y 2 v 1 − y 2 v 2 = y 2 ( K 11 ς − K 12 ς − y 1 + y 2 + v 1 − v 2 ) = y 2 { K 11 ς − K 12 ς − y 1 + y 2 + [ g ( x 1 ) − ∑ i = 1 2 y i α i K i 1 − b ] − [ g ( x 2 ) − ∑ i = 1 2 y i α i K i 2 − b ] } = y 2 [ ( K 11 − K 12 ) ς − y 1 + y 2 + g ( x 1 ) − g ( x 2 ) − ∑ i = 1 2 y i α i K i 1 + ∑ i = 1 2 y i α i K i 2 ] \begin{aligned} &(K_{11} + K_{22} - 2 K_{12}) \alpha_2 \\ =& y_2 K_{11} \varsigma - y_2 K_{12} \varsigma - y_1 y_2 + 1 + y_2 v_1 - y_2 v_2 \\ =& y_2 K_{11} \varsigma - y_2 K_{12} \varsigma - y_1 y_2 + y_2^2 + y_2 v_1 - y_2 v_2 \\ =& y_2 (K_{11} \varsigma - K_{12} \varsigma - y_1 + y_2 + v_1 - v_2) \\ =& y_2 \{K_{11} \varsigma - K_{12} \varsigma - y_1 + y_2 + [g(\mathbf{x}_1) - \sum_{i=1}^2 y_i \alpha_i K_{i1} - b] - [g(\mathbf{x}_2) - \sum_{i=1}^2 y_i \alpha_i K_{i2} - b]\} \\ =& y_2 [(K_{11} - K_{12}) \varsigma - y_1 + y_2 + g(\mathbf{x}_1) - g(\mathbf{x}_2) - \sum_{i=1}^2 y_i \alpha_i K_{i1} + \sum_{i=1}^2 y_i \alpha_i K_{i2}] \end{aligned} =====(K11+K222K12)α2y2K11ςy2K12ςy1y2+1+y2v1y2v2y2K11ςy2K12ςy1y2+y22+y2v1y2v2y2(K11ςK12ςy1+y2+v1v2)y2{K11ςK12ςy1+y2+[g(x1)i=12yiαiKi1b][g(x2)i=12yiαiKi2b]}y2[(K11K12)ςy1+y2+g(x1)g(x2)i=12yiαiKi1+i=12yiαiKi2]
ς = α 1 y 1 + α 2 y 2 \varsigma = \alpha_1 y_1 + \alpha_2 y_2 ς=α1y1+α2y2代入上式有
( K 11 + K 22 − 2 K 12 ) α 2 n e w , u n c = y 2 [ ( K 11 − K 12 ) ( α 1 o l d y 1 + α 2 o l d y 2 ) − y 1 + y 2 + g ( x 1 ) − g ( x 2 ) − ∑ i = 1 2 y i α i o l d K i 1 + ∑ i = 1 2 y i α i o l d K i 2 ] = y 2 { y 2 ( K 11 + K 22 − 2 K 12 ) α 2 o l d + [ g ( x 1 ) − y 1 ] − [ g ( x 2 ) − y 2 ] } = ( K 11 + K 22 − 2 K 12 ) α 2 o l d + y 2 ( E 1 − E 2 ) \begin{aligned} &(K_{11} + K_{22} - 2 K_{12}) \alpha_2^{new,unc} \\ =& y_2 [(K_{11} - K_{12}) (\alpha_1^{old} y_1 + \alpha_2^{old} y_2) - y_1 + y_2 + g(\mathbf{x}_1) - g(\mathbf{x}_2) - \sum_{i=1}^2 y_i \alpha_i^{old} K_{i1} + \sum_{i=1}^2 y_i \alpha_i^{old} K_{i2}] \\ =& y_2 \{y_2 (K_{11} + K_{22} - 2K_{12}) \alpha_2^{old} + [g(\mathbf{x}_1) - y_1] - [g(\mathbf{x}_2) - y_2]\} \\ =& (K_{11} + K_{22} - 2K_{12}) \alpha_2^{old} + y_2 (E_1 - E_2) \end{aligned} ===(K11+K222K12)α2new,uncy2[(K11K12)(α1oldy1+α2oldy2)y1+y2+g(x1)g(x2)i=12yiαioldKi1+i=12yiαioldKi2]y2{y2(K11+K222K12)α2old+[g(x1)y1][g(x2)y2]}(K11+K222K12)α2old+y2(E1E2)
其中, E i = g ( x i ) − y i , i = 1 , 2 E_i = g(\mathbf{x}_i) - y_i, \quad i=1,2 Ei=g(xi)yi,i=1,2

最终得到 α 2 n e w , u n c \alpha_2^{new,unc} α2new,unc的表达式:
α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) K 11 + K 22 − 2 K 12 \begin{aligned} \alpha_2^{new,unc} = \alpha_2^{old} + \frac{y_2 (E_1 - E_2)}{K_{11} + K_{22} - 2 K_{12}} \end{aligned} α2new,unc=α2old+K11+K222K12y2(E1E2)
利用上面讲到的关系式
α 2 n e w = { H , α 2 n e w , u n c > H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c < L \alpha_2^{new} = \left\{ \begin{aligned} H &, \quad \alpha_2^{new,unc} \gt H\\ \alpha_2^{new,unc} &, \quad L \le \alpha_2^{new,unc} \le H \\ L &, \quad \alpha_2^{new,unc} \lt L \end{aligned} \right. α2new=Hα2new,uncL,α2new,unc>H,Lα2new,uncH,α2new,unc<L
就能得到 α 2 n e w \alpha_2^{new} α2new,进而求得 α 1 n e w = y 1 ( ς − α 2 n e w y 2 ) \alpha_1^{new} = y_1 (\varsigma - \alpha_2^{new} y_2) α1new=y1(ςα2newy2)

SMO算法两个变量的选择

SMO算法需要选择合适的两个变量优化迭代,其余变量看作是常数,那么如何选择这两个变量呢?

第一个变量的选择

SMO算法称选择第一个变量的过程为外层循环,这个变量需要选择在训练集中违反KKT条件最严重的样本点。

对于样本点,要满足的KKT条件是:
α i ∗ = 0 ⇒ y i g ( x i ) ≥ 1 0 < α i ∗ < C ⇒ y i g ( x i ) = 1 α i ∗ = C ⇒ y i g ( x i ) ≤ 1 \alpha_i^* = 0 \Rightarrow y_i g(\mathbf{x}_i) \ge 1 \\ 0 \lt \alpha_i^* < C \Rightarrow y_i g(\mathbf{x}_i) = 1 \\ \alpha_i^* = C \Rightarrow y_i g(\mathbf{x}_i) \le 1 αi=0yig(xi)10<αi<Cyig(xi)=1αi=Cyig(xi)1
一般来说,我们首选违反 0 < α i ∗ < C ⇒ y i g ( x i ) 0 \lt \alpha_i^* \lt C \Rightarrow y_i g(\mathbf{x}_i) 0<αi<Cyig(xi)这个条件的点,其次是违反 α i ∗ = 0 ⇒ y i g ( x i ) ≥ 1 \alpha_i^* = 0 \Rightarrow y_i g(\mathbf{x}_i) \ge 1 αi=0yig(xi)1 α i ∗ = C ⇒ y i g ( x i ) ≤ 1 \alpha_i^* = C \Rightarrow y_i g(\mathbf{x}_i) \le 1 αi=Cyig(xi)1的点。

第二个变量的选择

SMO算法称选择第二个变量迭代为内层循环,假设我们在外层循环已经找到了 α 1 \alpha_1 α1,第二个变量 α 2 \alpha_2 α2的选择标准是让 ∣ E 1 − E 2 ∣ |E_1-E_2| E1E2有足够大的变化。这是因为确定了 α 1 \alpha_1 α1,就能确定 E 1 E_1 E1,所以要想 ∣ E 1 − E 2 ∣ |E_1-E_2| E1E2最大,只需要在 E 1 E_1 E1为正时,选择最小的 E i E_i Ei作为 E 2 E_2 E2,在 E 1 E_1 E1为负时,选择最大的 E i E_i Ei作为 E 2 E_2 E2,可以将所有的 E i E_i Ei保存下来加快迭代。

如果内存循环找到的点不能让目标函数有足够的下降,可以采用便利支持向量点来做 α 2 \alpha_2 α2,知道目标函数有足够的下降,如果所有的支持向量做 α 2 \alpha_2 α2都不能让目标函数有足够的下降,可以跳出循环,重新选择 α 1 \alpha_1 α1

计算截距 b b b和差值 E i E_i Ei

在每次完成两个变量的优化后,需要重新计算截距 b b b。当 0 < α 1 n e w < C 0 \lt \alpha_1^{new} \lt C 0<α1new<C时,由KKT条件可知:
∑ i = 1 N α i y i K i 1 + b = y 1 \sum_{i=1}^N \alpha_i y_i K_{i1} + b = y_1 i=1NαiyiKi1+b=y1
于是新的 b 1 n e w b_1^{new} b1new为:
b 1 n e w = y 1 − ∑ i = 3 N α i y i K i 1 − α 1 n e w y 1 K 11 − α 2 n e w y 2 K 21 b_1^{new} = y_1 - \sum_{i=3}^N \alpha_i y_i K_{i1} - \alpha_1^{new} y_1 K_{11} - \alpha_2^{new} y_2 K_{21} b1new=y1i=3NαiyiKi1α1newy1K11α2newy2K21
由于
E 1 = g ( x 1 ) − y 1 = ∑ i = 3 N α i y i K i 1 + α 1 o l d y 1 K 11 + α 2 o l d y 2 K 21 + b o l d − y 1 ⇒ y 1 − ∑ i = 3 N α i y i K i 1 = − E 1 + α 1 o l d y 1 K 11 + α 2 o l d y 2 K 21 + b o l d \begin{aligned} &E_1 = g(\mathbf{x}_1) - y_1 = \sum_{i=3}^N \alpha_i y_i K_{i1} + \alpha_1^{old} y_1 K_{11} + \alpha_2^{old} y_2 K_{21} + b^{old} - y_1 \\ \Rightarrow&y_1 - \sum_{i=3}^N \alpha_i y_i K_{i1} = - E_1 + \alpha_1^{old} y_1 K_{11} + \alpha_2^{old} y_2 K_{21} + b^{old} \end{aligned} E1=g(x1)y1=i=3NαiyiKi1+α1oldy1K11+α2oldy2K21+boldy1y1i=3NαiyiKi1=E1+α1oldy1K11+α2oldy2K21+bold
那么
b 1 n e w = − E 1 − y 1 K 11 ( α 1 n e w − α 1 o l d − y 2 K 21 ( α 2 n e w − α 2 o l d ) + b o l d ) b_1^{new} = - E_1 - y_1 K_{11} (\alpha_1^{new} - \alpha_1^{old} - y_2 K_{21} (\alpha_2^{new} - \alpha_2^{old}) + b^{old}) b1new=E1y1K11(α1newα1oldy2K21(α2newα2old)+bold)
同样的,如果 0 < α 2 n e w < C 0 \lt \alpha_2^{new} \lt C 0<α2new<C,那么有
b 2 n e w = − E 2 − y 1 K 12 ( α 1 n e w − α 1 o l d ) − y 2 K 22 ( α 2 n e w − α 2 o l d ) + b o l d b_2^{new} = - E_2 - y_1 K_{12} (\alpha_1^{new} - \alpha_1^{old}) - y_2 K_{22} (\alpha_2^{new} - \alpha_2^{old}) + b^{old} b2new=E2y1K12(α1newα1old)y2K22(α2newα2old)+bold
最终的 b n e w b^{new} bnew为:
b n e w = b 1 n e w + b 2 n e w 2 b^{new} = \frac{b_1^{new} + b_2^{new}}{2} bnew=2b1new+b2new
得到 b n e w b^{new} bnew后,我们更新 E i E_i Ei
E i = ∑ S y i α i κ ( x i , x j ) + b n e w − y i E_i = \sum_{S} y_i \alpha_i \kappa(\mathbf{x}_i, \mathbf{x}_j) + b^{new} - y_i Ei=Syiαiκ(xi,xj)+bnewyi
其中, S S S是所有支持向量 x j \mathbf{x}_j xj的集合。

SMO算法总结

输入:线性可分的 N N N个样本 ( x i , y i ) (\mathbf{x}_i,y_i) (xi,yi) i = 1 , 2 , ⋯   , N i=1,2,\cdots,N i=1,2,,N x i \mathbf{x}_i xi m m m维特征向量, y i ∈ { 1 , − 1 } y_i \in \{1,-1\} yi{1,1}是标签(label),精度 e e e

输出:近似解 α \alpha α

  1. 取初值 α 0 = 0 , k = 0 \alpha^0 = 0, k = 0 α0=0,k=0

  2. 选择 α 1 k \alpha_1^k α1k α 2 k \alpha_2^k α2k,求出新的 α 2 n e w , u n c \alpha_2^{new,unc} α2new,unc
    α 2 n e w , u n c = α 2 k + y 2 ( E 1 − E 2 ) K 11 + K 22 − 2 K 12 \begin{aligned} \alpha_2^{new,unc} = \alpha_2^k + \frac{y_2 (E_1 - E_2)}{K_{11} + K_{22} - 2 K_{12}} \end{aligned} α2new,unc=α2k+K11+K222K12y2(E1E2)

  3. 按照下式求出 α 2 k + 1 \alpha_2^{k+1} α2k+1
    α 2 k + 1 = { H , α 2 n e w , u n c > H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c < L \alpha_2^{k+1} = \left\{ \begin{aligned} H &, \quad \alpha_2^{new,unc} \gt H\\ \alpha_2^{new,unc} &, \quad L \le \alpha_2^{new,unc} \le H \\ L &, \quad \alpha_2^{new,unc} \lt L \end{aligned} \right. α2k+1=Hα2new,uncL,α2new,unc>H,Lα2new,uncH,α2new,unc<L

  4. 求出 α 1 k + 1 = y 1 ( ς − α 2 k + 1 y 2 ) \alpha_1^{k+1} = y_1 (\varsigma - \alpha_2^{k+1} y_2) α1k+1=y1(ςα2k+1y2)

  5. 求出 b k + 1 b^{k+1} bk+1 E i E_i Ei

  6. 在精度 e e e范围内检查是否满足如下的终止条件:
    ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N α i k + 1 = 0 ⇒ y i g ( x i ) ≥ 1 0 < α i k + 1 < C ⇒ y i g ( x i ) = 1 α i k + 1 = C ⇒ y i g ( x i ) ≤ 1 \sum_{i=1}^N \alpha_i y_i = 0 \\ 0 \le \alpha_i \le C, \quad i=1,2,\cdots,N \\ \alpha_i^{k+1} = 0 \Rightarrow y_i g(\mathbf{x}_i) \ge 1 \\ 0 \lt \alpha_i^{k+1} < C \Rightarrow y_i g(\mathbf{x}_i) = 1 \\ \alpha_i^{k+1} = C \Rightarrow y_i g(\mathbf{x}_i) \le 1 i=1Nαiyi=00αiC,i=1,2,,Nαik+1=0yig(xi)10<αik+1<Cyig(xi)=1αik+1=Cyig(xi)1

  7. 如果满足则结束,返回 α k + 1 \boldsymbol{\alpha}^{k+1} αk+1,否则转到步骤2。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值