机器学习——SVM(1 原始形式SVM)

本文详细解析了硬间隔支持向量机(SVM)算法的工作原理及数学模型,包括线性和非线性情况下的求解过程,并介绍了如何通过二次规划求解最优参数。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

算法描述:

Linear Hard-Margin SVM Algorithm
1. Q=[00dT0d0d];p=0d+1;anT=yn[1xnT];cn=1Q = \left[ {\begin{array}{}0&{0_d^T}\\ {{0_d}}&{{0_d}} \end{array}} \right];p = {0_{d + 1}};a_n^T = {y_n}\left[ {\begin{array}{}1&{x_n^T}\end{array}} \right];{c_n} = 1Q=[00d0dT0d];p=0d+1;anT=yn[1xnT];cn=1

2.[bω]←QP(Q,p,A,c)\left[ {\begin{array}{}b\\\omega \end{array}} \right] \leftarrow QP(Q,p,A,c)[bω]QP(Q,p,A,c)

3.返回 b&ωb\& \omegab&ω作为gSVM{g_{SVM}}gSVM的参数

计算过程:

为了表示方便,这里把第零项分出来单独表示,即,
b=ω0[∣ω∣]=[ω1⋮ωd]  ;[∣x∣]=[x1⋮xd]\begin{array}{l} b = {\omega _0}\\ \left[ {\begin{array}{} |\\ \omega \\ | \end{array}} \right] = \left[ {\begin{array}{} {{\omega _1}}\\ \vdots \\ {{\omega _d}} \end{array}} \right]\;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \left[ {\begin{array}{} |\\ x\\ | \end{array}} \right] = \left[ {\begin{array}{} {{x_1}}\\ \vdots \\ {{x_d}} \end{array}} \right] \end{array}b=ω0ω=ω1ωd;x=x1xd

SVM就是在之前正确分类的基础上,找到一组最好的参数 (b,ω)(b,\omega )(b,ω) 。而衡量好不好的标准为:距离超平面最近的分类点(support vectors)到超平面的距离(distance)

SVM里认为distance越大越好,因为distance越大,意味着能容忍更多的噪声以及能更好的克服过拟合的影响。

Hard-Margin: 表示要把所有的样本完全分开,有,
yn(ωTxn+b)>0{y_n}({\omega ^T}{x_n} + b) > 0yn(ωTxn+b)>0

所以,SVM问题可以描述为:
max⁡ωmargin(ω)s.t.everyynωTxn>0margin(ω)=min⁡n=1,2,⋯ ,Ndistance(xn,ω)\begin{array}{l} \mathop {\max }\limits_\omega {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{margin}}(\omega )\\ s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} every{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}{\omega ^T}{x_n} > 0\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{margin}}(\omega ) = \mathop {\min }\limits_{n = 1,2, \cdots ,N} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{distance}}({x_n},\omega ) \end{array}ωmaxmargin(ω)s.t.everyynωTxn>0margin(ω)=n=1,2,,Nmindistance(xn,ω)

空间里点到超平面的距离为:
distance(x,b,ω)=1∥ω∥∣ωTx+b∣{\rm{distance(x,b,}}\omega {\rm{) = }}\frac{1}{{\left\| \omega \right\|}}\left| {{\omega ^T}x + b} \right|distance(x,b,ω)=ω1ωTx+b

这里假设 min⁡n=1,2,⋯ ,Nyn(ωTxn+b)=1\mathop {\min }\limits_{n = 1,2, \cdots ,N} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) = 1n=1,2,,Nminyn(ωTxn+b)=1,有,
margin(b,ω)=1∥ω∥{\rm{margin}}(b,\omega ) = \frac{1}{{\left\| \omega \right\|}}margin(b,ω)=ω1

上式不好解,考虑放宽条件,
yn(ωTxn+b)≥1foralln{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge 1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} all{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} nyn(ωTxn+b)1foralln

放宽条件是可行的,因为如果在yn(ωTxn+b)≥c{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge cyn(ωTxn+b)c时获得优化参数(b,ω)(b,\omega )(b,ω),那么原来约束得出的参数应该为(bc,ωc)(\frac{b}{c},\frac{\omega }{c})(cb,cω),此时的参数比现在的参数更加“优化”,而这是矛盾的,因为yn(ωTxn+b)≥c{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge cyn(ωTxn+b)c是对所有n而言的。

取倒数 (变最大化为最小化),移除 \sqrt {},在前面加12\frac{1}{2}21(变成标准问题)
min⁡b,ω12ωTωs.t.yn(ωTxn+b)≥1foralln\begin{array}{l} \mathop {\min }\limits_{b,\omega } {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{1}{2}{\omega ^T}\omega \\ s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}{x_n} + b) \ge 1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} all{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} n \end{array}b,ωmin21ωTωs.t.yn(ωTxn+b)1foralln

上式是一个标准的二次规划问题(Quadratic Programming, QP),可以用解二次规划问题的方法来解。


标准的二次规划问题(QP):
optimalu←QP(Q,p,A,c)min⁡u12uTQu+pTus.t.amTu≥cmform=1,2,⋯ ,M\begin{array}{l} {\rm{optimal }}u \leftarrow QP(Q,p,A,c)\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \mathop {\min }\limits_u {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{1}{2}{u^T}Qu + {p^T}u\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} a_m^Tu \ge {c_m}\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} m = 1,2, \cdots ,M \end{array}optimaluQP(Q,p,A,c)umin21uTQu+pTus.t.amTucmform=1,2,,M


对比primal SVM和QP,有,
u=[bω];Q=[00dT0d0d];p=0d+1anT=yn[1xnT];cn=1;M=N\begin{array}{l} u = \left[ {\begin{array}{} b\\ \omega \end{array}} \right]{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} Q = \left[ {\begin{array}{} 0&{0_d^T}\\ {{0_d}}&{{0_d}} \end{array}} \right]{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} p = {0_{d + 1}}\\ \\ a_n^T = {y_n}\left[ {\begin{array}{} 1&{x_n^T} \end{array}} \right]{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} ;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {c_n} = 1;M = N \end{array}u=[bω];Q=[00d0dT0d];p=0d+1anT=yn[1xnT];cn=1;M=N

所以,原始形式的SVM可用QP方法来解。


非线性转换后的SVM

上述建立的SVM模型是线性模型,即都是关于原始特征 来讨论的,当对原始特征进行特征转换(Feature Transform, FT) 后, SVM可描述为,

min⁡b,ω12ωTωs.t.yn(ωTzn⎵ϕ(xn)+b)≥1,forn=1,2,⋯ ,N\begin{array}{l} \mathop {\min }\limits_{b,\omega } {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{1}{2}{\omega ^T}\omega \\ s.t.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {y_n}({\omega ^T}\underbrace {{z_n}}_{\phi ({x_n})} + b) \ge 1,\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} for{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} n = 1,2, \cdots ,N \end{array}b,ωmin21ωTωs.t.yn(ωTϕ(xn)zn+b)1,forn=1,2,,N

Non-Linear Hard-Margin SVM Algorithm
1.Q=[00d~T0d~0d~];p=0d~+1;anT=yn[1znT];cn=1Q = \left[ {\begin{array}{}0&{0_{\tilde d}^T}\\{{0_{\tilde d}}}&{{0_{\tilde d}}}\end{array}} \right];p = {0_{\tilde d + 1}};a_n^T ={y_n}\left[ {\begin{array}{}1&{z_n^T}\end{array}} \right];{c_n} = 1Q=[00d~0d~T0d~];p=0d~+1;anT=yn[1znT];cn=1

2.[bω]←QP(Q,p,A,c)\left[ {\begin{array}{}b\\\omega \end{array}} \right] \leftarrow QP(Q,p,A,c)[bω]QP(Q,p,A,c)

3.返回 b∈R&ω∈Rd~b\in R\&\omega \in {R^{\tilde d}}bR&ωRd~ 作为gSVM=sign(ωTϕ(xn)+b){g_{SVM}} = sign({\omega ^T}\phi ({x_n}) + b)gSVM=sign(ωTϕ(xn)+b)的参数
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值