线性可分支持向量机与硬间隔最大化
线性可分支持向量机
- 定义:给定线性可分训练数据集,通过间隔最大化或等价地求解相应的凸二次规划问题学习得到的分离超平面为
w∗⋅x+b∗=0w^* \cdot x +b^*=0w∗⋅x+b∗=0
以及相应的分类决策函数
f(x)=sign(w∗⋅x+b∗)f(x)=sign(w^* \cdot x +b^*)f(x)=sign(w∗⋅x+b∗)
称为线性可分支持向量机.
函数间隔和几何间隔
- 函数间隔定义:对于给定的训练数据集T和超平面(w,b)(w,b)(w,b)定义超平面(w,b)(w,b)(w,b)关于样本点(xi,yi)(x_i,y_i)(xi,yi)的函数间隔为
γ^i=yi(w⋅xi+b)\hat{\gamma}_i=y_i(w \cdot x_i+b)γ^i=yi(w⋅xi+b)
定义超平面(w,b)(w,b)(w,b)关于训练数据集TTT的函数间隔为超平面(w,b)(w,b)(w,b)关于TTT中所有样本点(xi,yi)(x_i,y_i)(xi,yi)的函数间隔最小值
γ^=mini=1,...,Nγ^i\hat{\gamma}=\min\limits_{i=1,...,N}\hat{\gamma}_iγ^=i=1,...,Nminγ^i - 几何间隔定义:对于给定的训练数据集TTT和超平面(w,b)(w,b)(w,b),定义超平面(w,b)(w,b)(w,b)关于样本点(xi,yi)(x_i,y_i)(xi,yi)的几何间隔为
γi=yi(w∣∣w∣∣⋅xi+b∣∣w∣∣)\gamma_i=y_i(\frac{w}{||w||}\cdot x_i+\frac{b}{||w||})γi=yi(∣∣w∣∣w⋅xi+∣∣w∣∣b)
定义超平面(w,b)(w,b)(w,b)关于训练数据集TTT的几何间隔为超平面(w,b)(w,b)(w,b)关于TTT中所有样本点(xi,yi)(x_i,y_i)(xi,yi)的几何间隔最小
γ=mini=1,...,Nγi{\gamma}=\min\limits_{i=1,...,N}{\gamma}_iγ=i=1,...,Nminγi
于是我们有
γi=γi^∣∣w∣∣{\gamma}_i=\frac{\hat{\gamma_i}}{||w||}γi=∣∣w∣∣γi^
γ=γ^∣∣w∣∣{\gamma}=\frac{\hat{\gamma}}{||w||}γ=∣∣w∣∣γ^
间隔最大化
最大间隔超平面为
maxw,b γ\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \max\limits_{w,b} \ \gamma w,bmax γ
s.t. yi(w∣∣w∣∣+b∣∣w∣∣)≥γ,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i(\frac{w}{||w||}+\frac{b}{||w||})\ge \gamma, i=1,2,...,N s.t. yi(∣∣w∣∣w+∣∣w∣∣b)≥γ,i=1,2,...,N
等价于
maxw,b γ^∣∣w∣∣\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \max\limits_{w,b} \ \frac{\hat{\gamma}}{||w||} w,bmax ∣∣w∣∣γ^
s.t. yi(w+b)≥γ^,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})\ge \hat{\gamma}, i=1,2,...,N s.t. yi(w+b)≥γ^,i=1,2,...,N
因为γ^\hat{\gamma}γ^取值无所谓,我们取γ^=1\hat{\gamma}=1γ^=1
则最终等价于
minw,b 12∣∣w∣∣2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \frac{1}{2}||w||^2 w,bmin 21∣∣w∣∣2
s.t. yi(w+b)−1≥0,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})-1 \ge0, i=1,2,...,N s.t. yi(w+b)−1≥0,i=1,2,...,N
这是一个凸优化问题
最终算法
输入:线性可分训练数据集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)}其中xi∈Rn,yi∈{−1,+1},i=1,2,...,Nx_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,Nxi∈Rn,yi∈{−1,+1},i=1,2,...,N
输出:最大间隔分离超平面和分类函数
(1)(1)(1)构造优化问题
minw,b 12∣∣w∣∣2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \frac{1}{2}||w||^2 w,bmin 21∣∣w∣∣2
s.t. yi(w+b)−1≥0,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})-1 \ge0, i=1,2,...,N s.t. yi(w+b)−1≥0,i=1,2,...,N
(2)(2)(2)得到分类超平面
w∗⋅x+b∗=0w^* \cdot x+b^*=0w∗⋅x+b∗=0
以及分类决策函数
f(x)=sign(w∗⋅x+b∗)f(x)=sign(w^* \cdot x+b^*)f(x)=sign(w∗⋅x+b∗)
- 最大间隔分离超平面存在且唯一性证明:
(1)(1)(1)存在性
由于数据线性可分,必然存在可行解,又由于目标函数有下界,所以最优解必然存在,记(w∗,b∗)(w^*,b^*)(w∗,b∗)又因为数据中存在正负样本,所以w∗≠0w^* \ne 0w∗=0,存在性得证
(2)(2)(2)唯一性
首先证明w∗w^*w∗唯一.假设有两个最优解(w1∗,b1∗)(w_1^*,b_1^*)(w1∗,b1∗)和(w2∗,b2∗)(w_2^*,b_2^*)(w2∗,b2∗)显然∣∣w1∗∣∣=∣∣w2∗∣∣=c||w_1^*||=||w_2^*||=c∣∣w1∗∣∣=∣∣w2∗∣∣=c
令w=w1∗+w2∗2,b=b1∗+b2∗2w=\frac{w_1^*+w_2^*}{2},b=\frac{b_1^*+b_2^*}{2}w=2w1∗+w2∗,b=2b1∗+b2∗,则c≤∣∣w∣∣≤12∣∣w1∗∣∣+12∣∣w2∗∣∣=c,c\le||w||\le\frac{1}{2}||w_1^*||+\frac{1}{2}||w_2^*||=c,c≤∣∣w∣∣≤21∣∣w1∗∣∣+21∣∣w2∗∣∣=c,所以∣∣w∣∣=12∣∣w1∗∣∣+12∣∣w2∗∣∣||w||=\frac{1}{2}||w_1^*||+\frac{1}{2}||w_2^*||∣∣w∣∣=21∣∣w1∗∣∣+21∣∣w2∗∣∣
从而∣∣w1∗∣∣=λ∣∣w2∗∣∣||w_1^*||=\lambda||w_2^*||∣∣w1∗∣∣=λ∣∣w2∗∣∣,∣λ∣=1|\lambda|=1∣λ∣=1如果λ=−1\lambda=-1λ=−1,则∣∣w∣∣=0||w||=0∣∣w∣∣=0矛盾,如果 λ=1\lambda=1λ=1,则∣∣w1∗∣∣=∣∣w2∗∣∣||w_1^*||=||w_2^*||∣∣w1∗∣∣=∣∣w2∗∣∣矛盾,所以w∗w^*w∗唯一
再证b∗b^*b∗
设x1‘,x2‘x_1^`,x_2^`x1‘,x2‘为集合{xi∣yi=+1}\{x_i|y_i=+1\}{xi∣yi=+1}中分别对应(w∗,b1∗)(w^*,b_1^*)(w∗,b1∗)和(w∗,b2∗)(w^*,b_2^*)(w∗,b2∗)成立的点
设x1‘‘,x2‘‘x_1^{``},x_2^{``}x1‘‘,x2‘‘为集合{xi∣yi=−1}\{x_i|y_i=-1\}{xi∣yi=−1}中分别对应(w∗,b1∗)(w^*,b_1^*)(w∗,b1∗)和(w∗,b2∗)(w^*,b_2^*)(w∗,b2∗)成立的点
则b1∗=−12(w∗⋅x1′+w∗⋅x1′′),b2∗=−12(w∗⋅x2′+w∗⋅x2′′)b_1^*=-\frac{1}{2}(w^* \cdot x_1^{'}+w^* \cdot x_1^{''}),b_2^*=-\frac{1}{2}(w^* \cdot x_2^{'}+w^* \cdot x_2^{''})b1∗=−21(w∗⋅x1′+w∗⋅x1′′),b2∗=−21(w∗⋅x2′+w∗⋅x2′′)
b1∗−b2∗=−12[w∗⋅(x1′−x2′)+w∗⋅(x1′′−x2′′)]b_1^*-b_2^*=-\frac{1}{2}[w^*\cdot (x_1^{'}-x_2^{'})+w^*\cdot (x_1^{''}-x_2^{''})]b1∗−b2∗=−21[w∗⋅(x1′−x2′)+w∗⋅(x1′′−x2′′)]
又
w∗⋅x2′+b1∗≥1=w∗⋅x1′+b1∗w^* \cdot x_2^{'}+b_1^* \ge 1=w^* \cdot x_1^{'}+b_1^*w∗⋅x2′+b1∗≥1=w∗⋅x1′+b1∗
w∗⋅x1′+b1∗≥1=w∗⋅x2′+b2∗w^* \cdot x_1^{'}+b_1^* \ge 1=w^* \cdot x_2^{'}+b_2^*w∗⋅x1′+b1∗≥1=w∗⋅x2′+b2∗
所以w∗⋅(x1′−x2′)=0w^* \cdot(x_1^{'}-x_2^{'})=0w∗⋅(x1′−x2′)=0,同理w∗⋅(x1′‘−x2′’)=0w^* \cdot(x_1^{'‘}-x_2^{'’})=0w∗⋅(x1′‘−x2′’)=0
所以b1∗=b2∗b_1^*=b_2^*b1∗=b2∗成立. - 支持向量和间隔边界
满足w⋅xi+b=yiw \cdot x_i+b=y_iw⋅xi+b=yi的点称为支持向量
H1:w⋅xi+b=+1H_1:w \cdot x_i+b=+1H1:w⋅xi+b=+1
H2:w⋅xi+b=−1H_2:w \cdot x_i+b=-1H2:w⋅xi+b=−1
则H1和H2H_1和H_2H1和H2之间的宽度为2∣∣w∣∣\frac{2}{||w||}∣∣w∣∣2为间隔边界
学习的对偶算法
对优化问题求解,首先定义拉格朗日函数
L(w,b,a)=12∣∣w∣∣2−∑i=1Naiyi(w⋅xi+b)+∑i=1Nai,L(w,b,a)=\frac{1}{2}||w||^2-\sum\limits_{i=1}^Na_iy_i(w \cdot x_i+b)+\sum\limits_{i=1}^Na_i,L(w,b,a)=21∣∣w∣∣2−i=1∑Naiyi(w⋅xi+b)+i=1∑Nai,其中ai≥0,i=1,2,...,Na_i \ge0,i=1,2,...,Nai≥0,i=1,2,...,N
定义a=(a1,a2,...,aN)Ta=(a_1,a_2,...,a_N)^Ta=(a1,a2,...,aN)T
则原问题等价于
maxaminw,bL(w,b,a)\max\limits_a\min\limits_{w,b}L(w,b,a)amaxw,bminL(w,b,a)
(1)(1)(1)求minw,bL(w,b,a)\min\limits_{w,b}L(w,b,a)w,bminL(w,b,a)另w,bw,bw,b偏导数等于0
∇wL(w,b,a)=w−∑i=1Naiyixi=0\nabla_wL(w,b,a)=w-\sum\limits_{i=1}^Na_iy_ix_i=0∇wL(w,b,a)=w−i=1∑Naiyixi=0
∇bL(w,b,a)=−∑i=1Naiyi=0\nabla_bL(w,b,a)=-\sum\limits_{i=1}^Na_iy_i=0∇bL(w,b,a)=−i=1∑Naiyi=0
得
w=∑i=1Naiyixiw=\sum\limits_{i=1}^Na_iy_ix_iw=i=1∑Naiyixi
∑i=1Naiyi=0\sum\limits_{i=1}^Na_iy_i=0i=1∑Naiyi=0
代入得
L(w,b,a)=12∑i=1N∑j=1Naiajyiyj(xi⋅xj)−∑i=1Naiyi((∑j=1Najyjxj)⋅xi+b+∑i=1Nai)L(w,b,a)=\frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_iy_i\Bigg((\sum\limits_{j=1}^Na_jy_jx_j)\cdot x_i +b+\sum\limits_{i=1}^Na_i\Bigg)L(w,b,a)=21i=1∑Nj=1∑Naiajyiyj(xi⋅xj)−i=1∑Naiyi((j=1∑Najyjxj)⋅xi+b+i=1∑Nai)
=−12∑i=1N∑j=1Naiajyiyj(xi⋅xj)+∑i=1Nai=-\frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)+\sum\limits_{i=1}^Na_i=−21i=1∑Nj=1∑Naiajyiyj(xi⋅xj)+i=1∑Nai
(2)(2)(2)
mina 12∑i=1N∑j=1Naiajyiyj(xi⋅xj)−∑i=1Nai\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i amin 21i=1∑Nj=1∑Naiajyiyj(xi⋅xj)−i=1∑Nai
s.t. ∑i=1Naiyi=0\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0 s.t. i=1∑Naiyi=0
ai≥0,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a_i\ge 0,i=1,2,...,N ai≥0,i=1,2,...,N
w∗=∑i=1Nai∗yixiw^*=\sum\limits_{i=1}^Na_i^*y_ix_iw∗=i=1∑Nai∗yixi
b∗=yi−∑i=1Nai∗yi(xi⋅xj),ai>0b^*=y_i-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_i>0b∗=yi−i=1∑Nai∗yi(xi⋅xj),ai>0
f(x)=sign(∑i=1Nai∗yi(x⋅xi)+b∗)f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*)f(x)=sign(i=1∑Nai∗yi(x⋅xi)+b∗)
算法
输入:线性可分训练集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)},其中xi∈Rn,yi∈{−1,+1},i=1,2,...,Nx_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,Nxi∈Rn,yi∈{−1,+1},i=1,2,...,N
输出:分离超平面和分类决策函数
(1)(1)(1)
mina 12∑i=1N∑j=1Naiajyiyj(xi⋅xj)−∑i=1Nai\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i amin 21i=1∑Nj=1∑Naiajyiyj(xi⋅xj)−i=1∑Nai
s.t. ∑i=1Naiyi=0\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0 s.t. i=1∑Naiyi=0
ai≥0,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a_i\ge 0,i=1,2,...,N ai≥0,i=1,2,...,N
求解a∗a^*a∗
(2)(2)(2)计算
w∗=∑i=1Nai∗yixiw^*=\sum\limits_{i=1}^Na_i^*y_ix_iw∗=i=1∑Nai∗yixi
b∗=yj−∑i=1Nai∗yi(xi⋅xj),aj>0b^*=y_j-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_j>0b∗=yj−i=1∑Nai∗yi(xi⋅xj),aj>0
(3)(3)(3)求得分类超平面
f(x)=sign(∑i=1Nai∗yi(x⋅xi)+b∗)f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*)f(x)=sign(i=1∑Nai∗yi(x⋅xi)+b∗)
线性支持向量机与软间隔最大化
线性支持向量机
- 定义 给定线性不可分的训练数据集,通过求解凸二次规划问题,即软间隔最大化,得到分离超平面为
w∗⋅x+b∗=0w^* \cdot x+b^*=0w∗⋅x+b∗=0
以及决策分类函数
f(x)=sign(w∗⋅x+b∗)f(x)=sign(w^* \cdot x+b^*)f(x)=sign(w∗⋅x+b∗)
称为线性支持向量机,
即
改变约束条件为
yi(w⋅xi+b)≥1−ξi,ξi≥0y_i(w \cdot x_i+b)\ge 1-\xi_i,\xi_i\ge0yi(w⋅xi+b)≥1−ξi,ξi≥0
目标函数为
12∣∣w∣∣2+C∑i=1Nξi,C>0\frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i,C>021∣∣w∣∣2+Ci=1∑Nξi,C>0
最终为
minw,b 12∣∣w∣∣2+C∑i=1Nξi \min\limits_{w,b} \ \ \ \frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w,bmin 21∣∣w∣∣2+Ci=1∑Nξi
s.t. yi(w+b)≥1−ξi,i=1,2,...,N s.t. \ \ \ \ y_i({w}+{b}) \ge1-\xi_i, i=1,2,...,Ns.t. yi(w+b)≥1−ξi,i=1,2,...,N
学习的对偶算法
根据对偶原理
L(w,b,ξ,a,μ)=12∣∣w∣∣2+C∑i=1Nξi−∑i=1Nai(yi(w⋅xi+b)−1+ξi)−∑i=1Nμiξi, ξi≥0,μi≥0L(w,b,\xi,a,\mu)=\frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i-\sum\limits_{i=1}^Na_i(y_i(w \cdot x_i+b)-1+\xi_i)-\sum_{i=1}^N\mu_i\xi_i,\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \xi_i \ge0,\mu_i\ge0L(w,b,ξ,a,μ)=21∣∣w∣∣2+Ci=1∑Nξi−i=1∑Nai(yi(w⋅xi+b)−1+ξi)−i=1∑Nμiξi, ξi≥0,μi≥0
∇wL(w,b,ξ,a,μ)=w−∑i=1Naiyixi=0\nabla_wL(w,b,\xi,a,\mu)=w-\sum\limits_{i=1}^Na_iy_ix_i=0∇wL(w,b,ξ,a,μ)=w−i=1∑Naiyixi=0
∇bL(w,b,ξ,a,μ)=−∑i=1Naiyi=0\nabla_bL(w,b,\xi,a,\mu)=-\sum\limits_{i=1}^Na_iy_i=0∇bL(w,b,ξ,a,μ)=−i=1∑Naiyi=0
∇ξiL(w,b,ξ,a,μ)=C−ai−μi=0\nabla_{\xi_i}L(w,b,\xi,a,\mu)=C-a_i-\mu_i=0∇ξiL(w,b,ξ,a,μ)=C−ai−μi=0
代入原式中得
minw,b 12∑i=1N∑j=1Naiajyiyj(xi⋅xj)−∑i=1Nai \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \ \ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i \ \ \ \ \ w,bmin 21i=1∑Nj=1∑Naiajyiyj(xi⋅xj)−i=1∑Nai
s.t. ∑i=1Naiyi=0 s.t. \ \ \ \ \ \ \ \ \ \ \ \ \sum\limits_{i=1}^Na_iy_i=0s.t. i=1∑Naiyi=0
0≤ai≤C,i=1,2,...,N \ \ \ \ \ \ 0\le a_i\le C,i=1,2,...,N 0≤ai≤C,i=1,2,...,N
其中
w∗=∑i=1Nai∗yixiw^*=\sum\limits_{i=1}^Na_i^*y_ix_iw∗=i=1∑Nai∗yixi
b∗=yj−∑i=1Nyiai∗(xi⋅xj),0<aj∗<Cb^*=y_j-\sum\limits_{i=1}^Ny_ia_i^*(x_i \cdot x_j),0<a_j^*<Cb∗=yj−i=1∑Nyiai∗(xi⋅xj),0<aj∗<C
算法:
输入:线性可分训练集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)},其中xi∈Rn,yi∈{−1,+1},i=1,2,...,Nx_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,Nxi∈Rn,yi∈{−1,+1},i=1,2,...,N
输出:分离超平面和分类决策函数
(1)(1)(1)
mina 12∑i=1N∑j=1Naiajyiyj(xi⋅xj)−∑i=1Nai\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i amin 21i=1∑Nj=1∑Naiajyiyj(xi⋅xj)−i=1∑Nai
s.t. ∑i=1Naiyi=0\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0 s.t. i=1∑Naiyi=0
0≤ai≤C,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i\le C,i=1,2,...,N 0≤ai≤C,i=1,2,...,N
求解a∗a^*a∗
(2)(2)(2)计算
w∗=∑i=1Nai∗yixiw^*=\sum\limits_{i=1}^Na_i^*y_ix_iw∗=i=1∑Nai∗yixi
b∗=yj−∑i=1Nai∗yi(xi⋅xj),aj>0b^*=y_j-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_j>0b∗=yj−i=1∑Nai∗yi(xi⋅xj),aj>0
(3)(3)(3)求得分类超平面
f(x)=sign(∑i=1Nai∗yi(x⋅xi)+b∗)f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*)f(x)=sign(i=1∑Nai∗yi(x⋅xi)+b∗)
支持向量
- 0<ai∗<C0<a_i^*<C0<ai∗<C则xix_ixi在间隔边界上
- ai∗=C,0<ξi<1a_i^*=C,0 < \xi_i <1ai∗=C,0<ξi<1则分类正确,且在间隔边界和超平面之间
- ai∗=C,ξi=1a_i^*=C,\xi_i =1ai∗=C,ξi=1则xix_ixi在分离超平面上
- ai∗=C,1<ξia_i^*=C,1 < \xi_iai∗=C,1<ξi则xix_ixi在另一测
合页损失函数
修改目标函数为
∑i=1N[1−yi(w⋅x+b)]++λ∣∣w∣∣2\sum\limits_{i=1}^N[1-y_i(w \cdot x+b)]_++ \lambda||w||^2i=1∑N[1−yi(w⋅x+b)]++λ∣∣w∣∣2
等价于线性支持向量机
取ξi=[1−yi(w⋅x+b)]+\xi_i=[1-y_i(w \cdot x+b)]_+ξi=[1−yi(w⋅x+b)]+
则
minw,b∑i=1Nξi+λ∣∣w∣∣2\min\limits_{w,b}\sum\limits_{i=1}^N\xi_i+\lambda||w||^2w,bmini=1∑Nξi+λ∣∣w∣∣2
取λ=12C\lambda=\frac{1}{2C}λ=2C1
则
minw,b1C(C∑i=1Nξi+12λ∣∣w∣∣2)\min\limits_{w,b}\frac{1}{C}(C\sum\limits_{i=1}^N\xi_i+\frac{1}{2}\lambda||w||^2)w,bminC1(Ci=1∑Nξi+21λ∣∣w∣∣2)
等价之
非线性支持向量机与核函数
核技巧
针对线性不可分问题,我们应用核技巧
设ϕ(x)\phi(x)ϕ(x)为x向特征空间的映射
k(x,z)=ϕ(x)⋅ϕ(z)k(x,z)=\phi(x) \cdot \phi(z)k(x,z)=ϕ(x)⋅ϕ(z)
替换xj⋅xix_j \cdot x_ixj⋅xi为k(x,z)k(x,z)k(x,z)
正定核
K(x,z)K(x,z)K(x,z)为正定核函数的充要条件为其Gram矩阵是半正定的
K=[K(xi,xj)]m×mK=[K(x_i,x_j)]_{m×m}K=[K(xi,xj)]m×m
为半正定
常用核函数
- 多项式核函数
K(x,z)=(x⋅z+1)pK(x,z)=(x \cdot z +1)^pK(x,z)=(x⋅z+1)p - 高斯核函数
K(x,z)=exp(−∣∣x−z∣∣22σ2)K(x,z)=\exp(-\frac{||x-z||^2}{2\sigma^2})K(x,z)=exp(−2σ2∣∣x−z∣∣2) - 字符串核函数
Kn(s,t)=∑u∈∑n[ϕn(s)]n[ϕn(t)]n=∑u∈∑n∑(i,j):s(i)=t(j)=uλl(i)+l(j)K_n(s,t)=\sum\limits_{u \in \sum^n}[\phi_n(s)]_n[\phi_n(t)]_n=\sum\limits_{u \in \sum^n}\sum\limits_{(i,j):s(i)=t(j)=u}\lambda^{l(i)+l(j)}Kn(s,t)=u∈∑n∑[ϕn(s)]n[ϕn(t)]n=u∈∑n∑(i,j):s(i)=t(j)=u∑λl(i)+l(j)
其中0<λ≤1,l(i)0<\lambda\le1,l(i)0<λ≤1,l(i)为字符串iii的长度,在s,ts,ts,t子串上进行
l(i)=i∣u∣−i1+1,1≤i1<i2,...,i∣u∣≤∣s∣l(i)=i_{|u|}-i_1+1,1\le i_1<i_2,...,i_{|u|}\le |s|l(i)=i∣u∣−i1+1,1≤i1<i2,...,i∣u∣≤∣s∣
非线性支持向量机
- 定义:从非线性分类训练集,通过核函数与软间隔最大化,或凸规划,学习得到的分类决策函数
f(x)=sign(∑i=1Nai∗yiK(x,xi)+b∗)f(x)=sign(\sum\limits_{i=1}^Na_i^*y_iK(x,x_i)+b^*)f(x)=sign(i=1∑Nai∗yiK(x,xi)+b∗)
称为非线性支持向量机,K(x,z)K(x,z)K(x,z)为正定核函数
算法:
输入:线性可分训练集T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)},其中xi∈Rn,yi∈{−1,+1},i=1,2,...,Nx_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,Nxi∈Rn,yi∈{−1,+1},i=1,2,...,N
输出:分离超平面和分类决策函数
(1)(1)(1)
mina 12∑i=1N∑j=1NaiajyiyjK(xi,xj)−∑i=1Nai\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_jK(x_i,x_j)-\sum\limits_{i=1}^Na_i amin 21i=1∑Nj=1∑NaiajyiyjK(xi,xj)−i=1∑Nai
s.t. ∑i=1Naiyi=0\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0 s.t. i=1∑Naiyi=0
0≤ai≤C,i=1,2,...,N\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i\le C,i=1,2,...,N 0≤ai≤C,i=1,2,...,N
求解a∗a^*a∗
(2)(2)(2)计算
w∗=∑i=1Nai∗yixiw^*=\sum\limits_{i=1}^Na_i^*y_ix_iw∗=i=1∑Nai∗yixi
b∗=yj−∑i=1Nai∗yiK(xi,xj),aj>0b^*=y_j-\sum\limits_{i=1}^Na_i^*y_iK(x_i,x_j),a_j>0b∗=yj−i=1∑Nai∗yiK(xi,xj),aj>0
(3)(3)(3)求得分类超平面
f(x)=sign(∑i=1Nai∗yiK(xi,x)+b∗)f(x)=sign(\sum\limits_{i=1}^Na_i^*y_iK(x_i,x)+b^*)f(x)=sign(i=1∑Nai∗yiK(xi,x)+b∗)
序列最小最优化算法
选择两个违反KKT条件的变量进行优化,直到满足停止条件或者都满足KKT条件,如果满足KKT条件,则是最优解
两个变量二次规划的求解方法
设选择a1,a2a_1,a_2a1,a2
mina1,a2 W(a1,a2)=12K11a12+12K22a22+y1y2K12a1a2−(a1+a2)+y1a1∑i=3NyiaiKi1+y2a2∑i=3NyiaiKi2\min\limits_{a_1,a_2}\ \ \ \ \ W(a_1,a_2)=\frac{1}{2}K_{11}a_1^2+\frac{1}{2}K_{22}a_2^2+y_1y_2K_{12}a_1a_2-(a_1+a_2)+y_1a_1\sum\limits_{i=3}^Ny_ia_iK_{i1}+y_2a_2\sum\limits_{i=3}^Ny_ia_iK_{i2}a1,a2min W(a1,a2)=21K11a12+21K22a22+y1y2K12a1a2−(a1+a2)+y1a1i=3∑NyiaiKi1+y2a2i=3∑NyiaiKi2
s.t. a1y1+a2y2=−∑i=3Nyiai=ξs.t.\ \ \ \ \ \ \ \ a_1y_1+a_2y_2=-\sum\limits_{i=3}^Ny_ia_i=\xis.t. a1y1+a2y2=−i=3∑Nyiai=ξ
0≤ai≤C,i=1,2\ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i \le C,i=1,2 0≤ai≤C,i=1,2
Kij=K(xi,xj)K_{ij}=K(x_i,x_j)Kij=K(xi,xj)
我们要求
L≤a2new≤HL \le a_2^{new}\le HL≤a2new≤H
-
y1≠y2y_1\ne y_2y1=y2 L=max(0,a2old−a1old),R=min(C,C+a2old−a1old)L=\max(0,a_2^{old}-a_1^{old}),R=\min(C,C+a_2^{old}-a_1^{old})L=max(0,a2old−a1old),R=min(C,C+a2old−a1old)
-
y1=y2y_1= y_2y1=y2 L=max(0,a2old+a1old−C),R=min(C,a2old+a1old)L=\max(0,a_2^{old}+a_1^{old}-C),R=\min(C,a_2^{old}+a_1^{old})L=max(0,a2old+a1old−C),R=min(C,a2old+a1old)
未剪辑和考虑约束条件的解为a2new,unca_2^{new,unc}a2new,unc
g(x)=∑i=1NaiyiK(xi,x)+bg(x)=\sum\limits_{i=1}^Na_iy_iK(x_i,x)+bg(x)=i=1∑NaiyiK(xi,x)+b
Ei=g(xi)−yi=(∑j=1NajyjK(xj,xi)+b)−yi, i=1,2E_i=g(x_i)-y_i=(\sum\limits_{j=1}^Na_jy_jK(x_j,x_i)+b)-y_i,\ \ \ \ \ i=1,2Ei=g(xi)−yi=(j=1∑NajyjK(xj,xi)+b)−yi, i=1,2
则
a2new,unc=a2old+y2(E1−E2)ηa_2^{new,unc}=a_2^{old}+\frac{y_2(E_1-E_2)}{\eta}a2new,unc=a2old+ηy2(E1−E2)
其中
η=K11+K22−2K12\eta=K_{11}+K_{22}-2K_{12}η=K11+K22−2K12
再进行剪辑
anew={Ha2new,unc>Ha2new,uncL≤a2new,unc≤HLa2new,unc<La^{new}=\begin{cases} H & a_2^{new,unc}>H\\ a_2^{new,unc} & L \le a_2^{new,unc} \le H\\ L & a_2^{new,unc}<L\\ \end{cases}anew=⎩⎪⎨⎪⎧Ha2new,uncLa2new,unc>HL≤a2new,unc≤Ha2new,unc<L
又
a1new=a1old+y1y2(a2old−a1old)a_1^{new}=a_1^{old}+y_1y_2(a_2^{old}-a_1^{old})a1new=a1old+y1y2(a2old−a1old) -
以上更新公式的证明:
记vi=∑j=3NajyjK(xi,xj)=g(xi)−∑j=12ajyjK(xi,xj)−bv_i=\sum\limits_{j=3}^Na_jy_jK(x_i,x_j)=g(x_i)-\sum\limits_{j=1}^2a_jy_jK(x_i,x_j)-bvi=j=3∑NajyjK(xi,xj)=g(xi)−j=1∑2ajyjK(xi,xj)−b
则原问题为
W(a1,a2)=12K11a12+12K22a22+y1y2K12a1a2−(a1+a2)+y1v1a1+y2v2a2W(a_1,a_2)=\frac{1}{2}K_{11}a_1^2+\frac{1}{2}K_{22}a_2^2+y_1y_2K_{12}a_1a_2-(a_1+a_2)+y_1v_1a_1+y_2v_2a_2W(a1,a2)=21K11a12+21K22a22+y1y2K12a1a2−(a1+a2)+y1v1a1+y2v2a2
又
a1=(ξ−y2a2)y1a_1=(\xi-y_2a_2)y_1a1=(ξ−y2a2)y1
则得到
W(a2)=12K11(ξ−a2y2)2+12K22a22+y2K12(ξ−a2y2)a2−(ξ−a2y2)y1−a2+v1(ξ−a2y2)+y2v2a2W(a_2)=\frac{1}{2}K_{11}(\xi-a_2y_2)^2+\frac{1}{2}K_{22}a_2^2+y_2K_{12}(\xi-a_2y_2)a_2-(\xi-a_2y_2)y_1-a_2+v_1(\xi-a_2y_2)+y_2v_2a_2W(a2)=21K11(ξ−a2y2)2+21K22a22+y2K12(ξ−a2y2)a2−(ξ−a2y2)y1−a2+v1(ξ−a2y2)+y2v2a2
求导
∂W∂a2=K11a2+K22a2−2K12a2−K11ξy2+K12ξy2+y1y2−1−v1y2+y2v2=0\frac{\partial W}{\partial a_2}=K_{11}a_2+K_{22}a_2-2K_{12}a_2-K_{11}\xi y_2+K_{12}\xi y_2+y_1y_2-1-v_1y_2+y_2v_2=0∂a2∂W=K11a2+K22a2−2K12a2−K11ξy2+K12ξy2+y1y2−1−v1y2+y2v2=0
同时
η=K11+K22−2K12\eta=K_{11}+K_{22}-2K_{12}η=K11+K22−2K12
得
a2new,unc=a2old+y2(E1−E2)ηa_2^{new,unc}=a_2^{old}+\frac{y_2(E_1-E_2)}{\eta}a2new,unc=a2old+ηy2(E1−E2)
变量的选择方法
- 选择第一个变量
KTT条件如下
ai=0 ⟺ yig(xi)≥1a_i=0\iff y_ig(x_i) \ge 1ai=0⟺yig(xi)≥1
0<ai<C ⟺ yig(xi)=10<a_i<C\iff y_ig(x_i) = 10<ai<C⟺yig(xi)=1
ai=C ⟺ yig(xi)≤1a_i=C\iff y_ig(x_i) \le 1ai=C⟺yig(xi)≤1
优先选择不满足第二个条件,再遍历整个数据集选其他不满足的 - 选择第二个变量
在第一个选择后,我们选择a2a_2a2的原则是尽量变化的快,即
- E1>0E_1>0E1>0,选最小的E2E_2E2
- E1<0E_1<0E1<0,选最大的E2E_2E2
优先选择间隔边界上的点,如果没有变化快的,则遍历整个数据集,如果再没有,则放弃a1a_1a1重新选择a1a_1a1
- 计算bbb和EiE_iEi
由KKT条件,0<a1new<C0<a_1^{new}<C0<a1new<C
∑i=1NaiyiKi1+b=y1\sum\limits_{i=1}^Na_iy_iK_{i1}+b=y_1i=1∑NaiyiKi1+b=y1
则
b1new=y1−∑i=3NaiyiKi1−a1newy1K11−a2newy2K21b_1^{new}=y_1-\sum\limits_{i=3}^Na_iy_iK_{i1}-a_1^{new}y_1K_{11}-a_2^{new}y_2K_{21}b1new=y1−i=3∑NaiyiKi1−a1newy1K11−a2newy2K21
又
E1=∑i=3NaiyiKi1+a1oldy1K11+a2oldy2K21+bold−y1E_1=\sum\limits_{i=3}^Na_iy_iK_{i1}+a_1^{old}y_1K_{11}+a_2^{old}y_2K_{21}+b^{old}-y_1E1=i=3∑NaiyiKi1+a1oldy1K11+a2oldy2K21+bold−y1
由两项得
b1new=−E1−y1K11(a1new−a1old)−y2K21(a2new−a2old)+boldb_1^{new}=-E_1-y_1K_{11}(a_1^{new}-a_1^{old})-y_2K_{21}(a_2^{new}-a_2^{old})+b^{old}b1new=−E1−y1K11(a1new−a1old)−y2K21(a2new−a2old)+bold
同样如果0<a2new<C0<a_2^{new}<C0<a2new<C
b2new=−E2−y1K12(a1new−a1old)−y2K22(a2new−a2old)+boldb_2^{new}=-E_2-y_1K_{12}(a_1^{new}-a_1^{old})-y_2K_{22}(a_2^{new}-a_2^{old})+b^{old}b2new=−E2−y1K12(a1new−a1old)−y2K22(a2new−a2old)+bold
如果a1new,a2newa_1^{new},a_2^{new}a1new,a2new同时满足条件,则b1new=b2newb_1^{new}=b_2^{new}b1new=b2new
如果a1new,a2newa_1^{new},a_2^{new}a1new,a2new为0或CCC,则我们取bnew=b1new+b2new2b^{new}=\frac{b_1^{new}+b_2^{new}}{2}bnew=2b1new+b2new
最后
Einew=∑SyjajK(xi,xj)+bnew−yiE_i^{new}=\sum\limits_{S}y_ja_jK(x_i,x_j)+b^{new}-y_iEinew=S∑yjajK(xi,xj)+bnew−yi
其中SSS为支持向量的集合
SMO算法
输入:训练数据集T={(x1,y1),(x2,y2),...,(xN,yN)},xi∈Rn,yi∈{−1,+1}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\},x_i \in R^n,y_i \in \{-1,+1\}T={(x1,y1),(x2,y2),...,(xN,yN)},xi∈Rn,yi∈{−1,+1},精度 ϵ\epsilonϵ
输出:近似解a^\hat{a}a^
(1)(1)(1)取初始值a(0)=0,k=0a^{(0)}=0,k=0a(0)=0,k=0
(2)(2)(2)按照算法求解以a1(k)a2(k),a^{(k)}_1a^{(k)}_2,a1(k)a2(k),求a1(k+1)a2(k+1),a^{(k+1)}_1a^{(k+1)}_2,a1(k+1)a2(k+1),
(3)(3)(3)如果以精度 ϵ\epsilonϵ满足条件则停止,
∑i=1Naiyi=0,0≤ai≤C,i=1,2,...,N\sum\limits_{i=1}^Na_iy_i=0,0\le a_i \le C,i=1,2,...,Ni=1∑Naiyi=0,0≤ai≤C,i=1,2,...,N
yi⋅g(xi)={≥1{xi∣ai=0}=1{xi∣0<ai<C}≤1{xi∣ai=C}
y_i \cdot g(x_i)=\begin{cases}
\ge 1 &\{x_i|a_i=0\}\\
=1 &\{x_i|0<a_i<C\}\\
\le 1 & \{x_i|a_i=C\}\\
\end{cases}
yi⋅g(xi)=⎩⎪⎨⎪⎧≥1=1≤1{xi∣ai=0}{xi∣0<ai<C}{xi∣ai=C}
其中
g(xi)=∑j=1NajyjK(xj,xi)+bg(x_i)=\sum\limits_{j=1}^Na_jy_jK(x_j,x_i)+bg(xi)=j=1∑NajyjK(xj,xi)+b
否则转(4),k=k+1(4),k=k+1(4),k=k+1
(4)(4)(4)a^=a(k+1)\hat{a}=a^{(k+1)}a^=a(k+1)