算法描述:
Linear Hard-Margin SVM Algorithm |
---|
1. Q=[00dT0d0d];p=0d+1;anT=yn[1xnT];cn=1Q = \left[ {\begin{array}{}0&{0_d^T}\\ {{0_d}}&{{0_d}} \end{array}} \right];p = {0_{d + 1}};a_n^T = {y_n}\left[ {\begin{array}{}1&{x_n^T}\end{array}} \right];{c_n} = 1Q=[00d0dT0d];p=0d+1;anT=yn[1xnT];cn=1 2.[bω]←QP(Q,p,A,c)\left[ {\begin{array}{}b\\\omega \end{array}} \right] \leftarrow QP(Q,p,A,c)[bω]←QP(Q,p,A,c) 3.返回 b&ωb\& \omegab&ω作为gSVM{g_{SVM}}gSVM的参数 |
计算过程:
为了表示方便,这里把第零项分出来单独表示,即,
b=ω0[∣ω∣]=[ω1⋮ωd]  ;[∣x∣]=[x1⋮xd]\begin{array}{l}
b = {\omega _0}\\
\left[ {\begin{array}{}
|\\
\omega \\
|
\end{array}} \right] = \left[ {\begin{array}{}
{{\omega _1}}\\
\vdots \\
{{\omega _d}}
\end{array}} \right]\;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \left[ {\begin{array}{}
|\\
x\\
|
\end{array}} \right] = \left[ {\begin{array}{}
{{x_1}}\\
\vdots \\
{{x_d}}
\end{array}} \right]
\end{array}b=ω0⎣⎡∣ω∣⎦⎤=⎣⎢⎡ω1⋮ωd⎦⎥⎤;⎣⎡∣x∣⎦⎤=⎣⎢⎡x1⋮xd⎦⎥⎤
SVM就是在之前正确分类的基础上,找到一组最好的参数 (b,ω)(b,\omega )(b,ω) 。而衡量好不好的标准为:距离超平面最近的分类点(support vectors)到超平面的距离(distance)。
SVM里认为distance越大越好,因为distance越大,意味着能容忍更多的噪声以及能更好的克服过拟合的影响。
Hard-Margin: 表示要把所有的样本完全分开,有,
yn(ωTxn+b)>0{y_n}({\omega ^T}{x_n} + b) > 0yn(ωTxn+b)>0
所以,SVM问题可以描述为:
maxωmargin(ω)s.t.everyynωTxn>0margin(ω)=minn=1,2,⋯ ,Ndistance(xn,ω)\begin{array}{l}
\mathop {\max }\limits_\omega {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{margin}}(\omega )\\
s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} every{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}{\omega ^T}{x_n} > 0\\
{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{margin}}(\omega ) = \mathop {\min }\limits_{n = 1,2, \cdots ,N} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{distance}}({x_n},\omega )
\end{array}ωmaxmargin(ω)s.t.everyynωTxn>0margin(ω)=n=1,2,⋯,Nmindistance(xn,ω)
空间里点到超平面的距离为:
distance(x,b,ω)=1∥ω∥∣ωTx+b∣{\rm{distance(x,b,}}\omega {\rm{) = }}\frac{1}{{\left\| \omega \right\|}}\left| {{\omega ^T}x + b} \right|distance(x,b,ω)=∥ω∥1∣∣ωTx+b∣∣
这里假设 minn=1,2,⋯ ,Nyn(ωTxn+b)=1\mathop {\min }\limits_{n = 1,2, \cdots ,N} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) = 1n=1,2,⋯,Nminyn(ωTxn+b)=1,有,
margin(b,ω)=1∥ω∥{\rm{margin}}(b,\omega ) = \frac{1}{{\left\| \omega \right\|}}margin(b,ω)=∥ω∥1
上式不好解,考虑放宽条件,
yn(ωTxn+b)≥1foralln{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge 1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} all{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} nyn(ωTxn+b)≥1foralln
放宽条件是可行的,因为如果在yn(ωTxn+b)≥c{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge cyn(ωTxn+b)≥c时获得优化参数(b,ω)(b,\omega )(b,ω),那么原来约束得出的参数应该为(bc,ωc)(\frac{b}{c},\frac{\omega }{c})(cb,cω),此时的参数比现在的参数更加“优化”,而这是矛盾的,因为yn(ωTxn+b)≥c{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge cyn(ωTxn+b)≥c是对所有n而言的。
取倒数 (变最大化为最小化),移除 \sqrt {},在前面加12\frac{1}{2}21(变成标准问题),
minb,ω12ωTωs.t.yn(ωTxn+b)≥1foralln\begin{array}{l}
\mathop {\min }\limits_{b,\omega } {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{1}{2}{\omega ^T}\omega \\
s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge 1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} all{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} n
\end{array}b,ωmin21ωTωs.t.yn(ωTxn+b)≥1foralln
上式是一个标准的二次规划问题(Quadratic Programming, QP),可以用解二次规划问题的方法来解。
标准的二次规划问题(QP):
optimalu←QP(Q,p,A,c)minu12uTQu+pTus.t.amTu≥cmform=1,2,⋯ ,M\begin{array}{l}
{\rm{optimal }}u \leftarrow QP(Q,p,A,c)\\
{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \mathop {\min }\limits_u {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{1}{2}{u^T}Qu + {p^T}u\\
{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} a_m^Tu \ge {c_m}\\
{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} m = 1,2, \cdots ,M
\end{array}optimalu←QP(Q,p,A,c)umin21uTQu+pTus.t.amTu≥cmform=1,2,⋯,M
对比primal SVM和QP,有,
u=[bω];Q=[00dT0d0d];p=0d+1anT=yn[1xnT];cn=1;M=N\begin{array}{l}
u = \left[ {\begin{array}{}
b\\
\omega
\end{array}} \right]{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} Q = \left[ {\begin{array}{}
0&{0_d^T}\\
{{0_d}}&{{0_d}}
\end{array}} \right]{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} p = {0_{d + 1}}\\ \\
a_n^T = {y_n}\left[ {\begin{array}{}
1&{x_n^T}
\end{array}} \right]{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {c_n} = 1;M = N
\end{array}u=[bω];Q=[00d0dT0d];p=0d+1anT=yn[1xnT];cn=1;M=N
所以,原始形式的SVM可用QP方法来解。
非线性转换后的SVM
上述建立的SVM模型是线性模型,即都是关于原始特征 来讨论的,当对原始特征进行特征转换(Feature Transform, FT) 后, SVM可描述为,
minb,ω12ωTωs.t.yn(ωTzn⎵ϕ(xn)+b)≥1,forn=1,2,⋯ ,N\begin{array}{l} \mathop {\min }\limits_{b,\omega } {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{1}{2}{\omega ^T}\omega \\ s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}\underbrace {{z_n}}_{\phi ({x_n})} + b) \ge 1,\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} n = 1,2, \cdots ,N \end{array}b,ωmin21ωTωs.t.yn(ωTϕ(xn)zn+b)≥1,forn=1,2,⋯,N
Non-Linear Hard-Margin SVM Algorithm |
---|
1.Q=[00d~T0d~0d~];p=0d~+1;anT=yn[1znT];cn=1Q = \left[ {\begin{array}{}0&{0_{\tilde d}^T}\\{{0_{\tilde d}}}&{{0_{\tilde d}}}\end{array}} \right];p = {0_{\tilde d + 1}};a_n^T ={y_n}\left[ {\begin{array}{}1&{z_n^T}\end{array}} \right];{c_n} = 1Q=[00d~0d~T0d~];p=0d~+1;anT=yn[1znT];cn=1 2.[bω]←QP(Q,p,A,c)\left[ {\begin{array}{}b\\\omega \end{array}} \right] \leftarrow QP(Q,p,A,c)[bω]←QP(Q,p,A,c) 3.返回 b∈R&ω∈Rd~b\in R\&\omega \in {R^{\tilde d}}b∈R&ω∈Rd~ 作为gSVM=sign(ωTϕ(xn)+b){g_{SVM}} = sign({\omega ^T}\phi ({x_n}) + b)gSVM=sign(ωTϕ(xn)+b)的参数 |