问题
已知样本为[x1,x2...xn][{x_1},{x_2}...{x_n}][x1,x2...xn],样本标签为[y1,y2...yn],yi=0,1[{y_1},{y_2}...{y_n}],{y_i} = 0,1[y1,y2...yn],yi=0,1。试通过SVM法求最好分隔样本的超平面,写出其计算过程。
1构造优化问题
解:假设存在一个超平面wx+b=0wx + b = 0wx+b=0能完全分隔样本,则通过尺度收缩总能找到两个超平面wx+b=−1wx + b = -1wx+b=−1和wx+b=1wx + b = 1wx+b=1,使样本在平面上或平面外侧,如下图所示。
即满足:
(1)yi(wxi+b)≥1{y_i}(w{x_i} + b) \ge 1 \tag{1}yi(wxi+b)≥1(1)
但对线性可分的训练数据集而言,使其线性可分的超平面有无穷多个,那么我们该如何选择这两个平面呢?
我们希望我们找的2个平面能最好分隔样本点,那么什么才是最好分隔呢?不难想象,使这两个平面的距离尽可能大,则两类样本的差异就越明显,分类效果最好。即目标是:
(2)maxdmax d \tag{2}maxd(2)
其中d为两个分隔面之间的距离,这样的平面只有一个。
记x1,x2x_1,x_2x1,x2分别是wx+b=−1wx + b = -1wx+b=−1和wx+b=1wx + b = 1wx+b=1上的两点,且垂直于两个平面x1x2x_1x_2x1x2,即∣∣x1x2∣∣=d||x_1x_2||=d∣∣x1x2∣∣=d。
因为:
(3)x1x2=x2−x1=λwx_1x_2 = x_2-x_1= \lambda w \tag{3}x1x2=x2−x1=λw(3)
(3)式代入wx2+b=1w{x_2} + b = 1wx2+b=1可得:
(4)w(x1+λw)+b=1w({x_1} + \lambda w) + b = 1 \tag{4}w(x1+λw)+b=1(4)
代入wx1+b=−1w{x_1} + b = -1wx1+b=−1到(4)式可得:
(5)λw2=2\lambda {w^2} = 2 \tag{5}λw2=2(5)
从而得:
maxd=max∣x2−x1∣=maxλ∥w∥=max2w2∥w∥=max2∥w∥\max d = \max |{x_2} - {x_1}|{\rm{ = max}}\lambda \left\| w \right\| = \max \frac{2}{{{w^2}}}\left\| w \right\| = \max \frac{2}{{\left\| w \right\|}}maxd=max∣x2−x1∣=maxλ∥w∥=maxw22∥w∥=max∥w∥2
等价于minw22\min \frac{{{w^2}}}{2}min2w2。
即原问题变为凸优化问题:
(6)minw22\min \frac{{{w^2}}}{2} \tag{6}min2w2(6)
st:yi(wxi+b)≥1,i=1,..Nst:{y_i}(w{x_i} + b) \ge 1,i = 1,..Nst:yi(wxi+b)≥1,i=1,..N
2拉格朗日对偶求解
构建拉格朗日函数:
(7)L(w,b,α)=w22+∑i=1Nαi(1−yi(wxi+b))L(w,b,\alpha ) = \frac{{{w^2}}}{2} + \sum\limits_{i = 1}^N {{\alpha _i}(1 - {y_i}(w{x_i} + b))} \tag{7}L(w,b,α)=2w2+i=1∑Nαi(1−yi(wxi+b))(7)
其中αi≥0{\alpha _i} \ge 0αi≥0为拉格朗日乘子。
根据拉格朗日对偶性,原问题的对偶问题是最大最小值问题:
(8)maxαminw,bL(w,b,α){\max _\alpha }{\min _{w,b}}L(w,b,\alpha ) \tag{8}αmaxw,bminL(w,b,α)(8)
首先求解minw,bL(w,b,α){\min _{w,b}}L(w,b,\alpha )minw,bL(w,b,α),对w求梯度,令其为0:
∇wL(w,b,α)=w−∑i=1Nαiyixi=0{\nabla _w}L(w,b,\alpha ) = w - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} {\rm{ = }}0∇wL(w,b,α)=w−i=1∑Nαiyixi=0
∇bL(w,b,α)=∑i=1Nαiyi=0{\nabla _b}L(w,b,\alpha ) = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0∇bL(w,b,α)=i=1∑Nαiyi=0
可得:
(9)w=∑i=1Nαiyixiw = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} \tag{9}w=i=1∑Nαiyixi(9)
(10)∑i=1Nαiyi=0\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0 \tag{10}i=1∑Nαiyi=0(10)
把(9)(10)代入(7)式可得:
minL(w,b,α)w,b=12∑i=1N∑j=1Nαiαjyiyjxixj+∑i=1Nαi−∑i=1Nαiyi((∑j=1Nαjyjxj)xi)−b∑i=1Nαiyi=−12∑i=1N∑j=1Nαiαjyiyjxixj+∑i=1Nαi\begin{array}{l}
\min L{(w,b,\alpha )_{w,b}} = \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j}{\rm{ + }}\sum\limits_{i = 1}^N {{\alpha _i}} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}((\sum\limits_{j = 1}^N {{\alpha _j}{y_j}{x_j}){x_i})} } - b\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} \\
= - \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} {\rm{ + }}\sum\limits_{i = 1}^N {{\alpha _i}}
\end{array}minL(w,b,α)w,b=21i=1∑Nj=1∑Nαiαjyiyjxixj+i=1∑Nαi−i=1∑Nαiyi((j=1∑Nαjyjxj)xi)−bi=1∑Nαiyi=−21i=1∑Nj=1∑Nαiαjyiyjxixj+i=1∑Nαi
然后求minL(w,b,α)w,bmin L{(w,b,\alpha )_{w,b}}minL(w,b,α)w,b对α\alphaα的极大,即是对偶问题:
(11)maxα−12∑i=1N∑j=1Nαiαjyiyjxixj+∑i=1Nαi{\max _\alpha } - \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} + \sum\limits_{i = 1}^N {{\alpha _i}} \tag{11}αmax−21i=1∑Nj=1∑Nαiαjyiyjxixj+i=1∑Nαi(11)
s.t:∑i=1Nαiyi=0s.t:\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0s.t:i=1∑Nαiyi=0
αi≥0,i=1,...N{\alpha _i} \ge 0,i = 1,...Nαi≥0,i=1,...N
上式等价于:
(12)minα12∑i=1N∑j=1Nαiαjyiyjxixj−∑i=1Nαi{\min _\alpha }\frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} - \sum\limits_{i = 1}^N {{\alpha _i}} \tag{12}αmin21i=1∑Nj=1∑Nαiαjyiyjxixj−i=1∑Nαi(12)
s.t:∑i=1Nαiyi=0s.t:\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0s.t:i=1∑Nαiyi=0
αi≥0,i=1,...N{\alpha _i} \ge 0,i = 1,...Nαi≥0,i=1,...N
(12)式是(6)式的对偶问题。
最后,使用SMO算法(序列最小最优化)即可求出对偶问题的解αi∗{\alpha _i}^*αi∗,再通过(9)和(10)式可得到原问题的解w∗{w^*}w∗和b∗{b^*}b∗,从而得到最优超平面w∗x+b∗=0{w^*}x + {b^*} = 0w∗x+b∗=0,即∑i=1Nαi∗yi(xix)+b∗=0\sum\limits_{i = 1}^N {{\alpha _i}^{\rm{*}}{y_i}({x_i}x)} + {b^*} = 0i=1∑Nαi∗yi(xix)+b∗=0。得到分类决策函数:
(13)f(x)=sign(∑i=1Nαi∗yi(xix)+b∗)f(x) = sign(\sum\limits_{i = 1}^N {{\alpha _i}^{\rm{*}}{y_i}({x_i}x)} + {b^*}) \tag{13}f(x)=sign(i=1∑Nαi∗yi(xix)+b∗)(13)