支持向量机
以二分类为例, y i ∈ { − 1 , 1 } y_i\in\{-1,1\} yi∈{−1,1},+1为正例,-1为负例
线性可分数据集的支持向量机
间距
样本
(
x
i
,
y
i
)
(\bm{x_i},y_i)
(xi,yi)到超平面
f
(
x
)
=
w
T
x
+
b
f(x)=w^Tx+b
f(x)=wTx+b的距离为(假设正例在正向,负例在负向),
d
=
y
i
(
w
T
x
i
+
b
)
∣
∣
w
∣
∣
2
,
d=\frac{y_i(w^T\bm{x_i}+b)}{\sqrt{||w||_2}},
d=∣∣w∣∣2yi(wTxi+b),
超平面
假设给定线性可分训练数据集,通过间隔最大化得到的分离超平面为
y
(
x
)
=
w
T
Φ
(
x
)
+
b
,
y(x)=w^T\Phi(x)+b,
y(x)=wTΦ(x)+b,
相应的分类决策函数为
f
(
x
)
=
sign
(
w
T
Φ
(
x
)
+
b
)
f(x)=\operatorname{sign}\left(w^{T} \Phi(x)+b\right)
f(x)=sign(wTΦ(x)+b)。
- 该决策函数称为线性可分支持向量机。
-
Φ
(
x
)
\Phi(x)
Φ(x)是某个确定的特征空间转换函数,作用是将
x
x
x映射到(更高的)维度。
最简单直接的: Φ ( x ) = x \Phi(x)=x Φ(x)=x - 求解分离超平面问题可以等价为求解相应的凸二次规划问题
-
{
y
(
x
i
)
>
0
⇔
y
i
=
+
1
y
(
x
i
)
<
0
⇔
y
i
=
−
1
⇒
y
i
⋅
y
(
x
i
)
>
0
\left\{\begin{array}{l} y\left(x_{i}\right)>0 \Leftrightarrow y_{i}=+1 \\ y\left(x_{i}\right)<0 \Leftrightarrow y_{i}=-1 \end{array} \Rightarrow y_{i} \cdot y\left(x_{i}\right)>0\right.
{y(xi)>0⇔yi=+1y(xi)<0⇔yi=−1⇒yi⋅y(xi)>0
从而目标函数为
max w , b min i y i ⋅ y ( x i ) ∥ w ∥ = max w , b min i y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ∥ w ∥ \max_{w,b} \min_i \frac{y_{i} \cdot y\left(x_{i}\right)}{\|w\|}=\max_{w,b} \min_i \frac{y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)}{\|w\|} w,bmaximin∥w∥yi⋅y(xi)=w,bmaximin∥w∥yi⋅(wT⋅Φ(xi)+b)
目标函数
(PS:
由几何知识可知,空间任意一点x到超平面
S
:
w
T
x
+
b
=
0
S:w^Tx+b=0
S:wTx+b=0的距离公式为:
(
w
T
x
+
b
)
∣
∣
w
∣
∣
2
,
\frac{(w^T\bm{x}+b)}{\sqrt{||w||_2}},
∣∣w∣∣2(wTx+b),
设属于两个分类的支持向量到该超平面的距离都为 d ( > 0 ) d (>0) d(>0),由于支持向量是各自分类数据中,距离超平面最近的点,针对所有数据有以下不等式:
( w T x + b ) ∣ ∣ w ∣ ∣ 2 ≥ d \frac{(w^T\bm{x}+b)}{\sqrt{||w||_2}}\ge d ∣∣w∣∣2(wTx+b)≥d
公式(1)两边同除以d,可得:,使用换元法,令:
{
α
T
=
w
T
d
∣
∣
w
∣
∣
β
=
b
d
∣
∣
w
∣
∣
\begin{cases} &\alpha^T=\frac{w^T}{d||w||}\\ & \beta=\frac{b}{d||w||} \end{cases}
{αT=d∣∣w∣∣wTβ=d∣∣w∣∣b
这样就得到约束条件常见形式:
∣
α
T
X
+
β
∣
≥
1
|\alpha^TX+\beta|\ge1
∣αTX+β∣≥1
只是改变了
w
T
x
+
b
w^T\bm{x}+b
wTx+b的函数值为
α
T
X
+
β
\alpha^TX+\beta
αTX+β,未改变点与平面的距离
)
可以通过等比例缩放
w
w
w 和
b
b
b 的方法(并未改变超平面的方向,只改变了
(
w
T
⋅
Φ
(
x
i
)
+
b
\left(w^{T} \cdot \Phi(x_{i}\right)+b
(wT⋅Φ(xi)+b的值),使得两类点的函数值都满足
∣
y
(
x
i
)
∣
≥
1
|y(x_i)|≥1
∣y(xi)∣≥1(两侧的支撑向量的函数值取到1),,即
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
y_{i} \cdot\left(w^{T} \cdot \Phi(x_{i}\right)+b)\ge1
yi⋅(wT⋅Φ(xi)+b)≥1
图中与超平面得间隔
m
a
r
g
i
n
margin
margin为
1
∣
∣
w
∣
∣
\frac{1}{||w||}
∣∣w∣∣1.
从而目标函数转为
max
w
,
b
min
i
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
∥
w
∥
⇔
max
w
,
b
1
∥
w
∥
,
s
.
t
.
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
,
i
=
1
,
…
,
n
⇔
min
w
,
b
1
2
∥
w
∥
2
,
s
.
t
.
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
,
i
=
1
,
…
,
n
\begin{aligned} &\max_{w,b} \min_i \frac{y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)}{\|w\|}\\ \Leftrightarrow &\max_{w,b}\frac{1}{\|w\|},\quad s.t. \quad y_{i} \cdot\left(w^{T} \cdot \Phi(x_{i}\right)+b)\ge1,i=1,\dots,n\\ \Leftrightarrow &\min_{w,b}\frac{1}{2}\|w\|^2,\quad s.t. \quad y_{i} \cdot\left(w^{T} \cdot \Phi(x_{i}\right)+b)\ge1,i=1,\dots,n \end{aligned}
⇔⇔w,bmaximin∥w∥yi⋅(wT⋅Φ(xi)+b)w,bmax∥w∥1,s.t.yi⋅(wT⋅Φ(xi)+b)≥1,i=1,…,nw,bmin21∥w∥2,s.t.yi⋅(wT⋅Φ(xi)+b)≥1,i=1,…,n
由拉格朗日乘子法,得
L
(
w
,
b
,
α
)
=
1
2
∥
w
∥
2
−
∑
i
=
1
n
α
i
(
y
i
(
w
T
⋅
Φ
(
x
i
)
+
b
)
−
1
)
.
(1)
L(w, b, \alpha)=\frac{1}{2}\|w\|^{2}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)-1\right).\tag{1}
L(w,b,α)=21∥w∥2−i=1∑nαi(yi(wT⋅Φ(xi)+b)−1).(1)
由于原问题是极小极大问题
min
w
,
b
max
α
L
(
w
,
b
,
α
)
\min _{w, b} \max _{\alpha} L(w, b, \alpha)
w,bminαmaxL(w,b,α)
原始问题的对偶问题,是极大极小问题
max
α
min
w
,
b
L
(
w
,
b
,
α
)
.
\max _{\alpha} \min _{w, b} L(w, b, \alpha).
αmaxw,bminL(w,b,α).
拉格朗日乘子法求解
先求对偶问题中
min
w
,
b
L
(
w
,
b
,
α
)
\min _{w, b} L(w, b, \alpha)
minw,bL(w,b,α)部分,将拉格朗日函数
L
(
w
,
b
,
α
)
L(w,b,\alpha)
L(w,b,α) 分别 对
w
,
b
w,b
w,b 求并令其为 0
∂
L
∂
w
=
0
⇒
w
=
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
∂
L
∂
b
=
0
⇒
0
=
∑
i
=
1
n
α
i
y
i
(2)
\begin{aligned} \frac{\partial L}{\partial w}=0 &\Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right) \\ \frac{\partial L}{\partial b}=0& \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i} \end{aligned}\tag{2}
∂w∂L=0∂b∂L=0⇒w=i=1∑nαiyiΦ(xi)⇒0=i=1∑nαiyi(2)
将(2)式带入(1)得
L
(
w
,
b
,
α
)
=
1
2
∥
w
∥
2
−
∑
i
=
1
n
α
i
(
y
i
(
w
T
⋅
Φ
(
x
i
)
+
b
)
−
1
)
=
1
2
w
T
w
−
w
T
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
−
b
∑
i
=
1
n
α
i
y
i
+
∑
i
=
1
n
α
i
=
1
2
w
T
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
−
w
T
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
−
b
⋅
0
+
∑
i
=
1
n
α
i
=
∑
i
=
1
n
α
i
−
1
2
(
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
)
T
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
=
∑
i
=
1
n
α
i
−
1
2
∑
i
,
j
=
1
n
α
i
α
j
y
i
y
j
Φ
T
(
x
i
)
Φ
(
x
j
)
\begin{aligned} L(w, b, \alpha)=&\frac{1}{2}\|w\|^{2}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)-1\right) \\ =& \frac{1}{2} w^{T} w-w^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)-b \sum_{i=1}^{n} \alpha_{i} y_{i}+\sum_{i=1}^{n} \alpha_{i} \\ =& \frac{1}{2} w^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)-w^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)-b \cdot 0+\sum_{i=1}^{n} \alpha_{i} \\ =& \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2}\left(\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)\right)^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right) \\ =&\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j} \Phi^{T}\left(x_{i}\right) \Phi\left(x_{j}\right) \\ \end{aligned}
L(w,b,α)=====21∥w∥2−i=1∑nαi(yi(wT⋅Φ(xi)+b)−1)21wTw−wTi=1∑nαiyiΦ(xi)−bi=1∑nαiyi+i=1∑nαi21wTi=1∑nαiyiΦ(xi)−wTi=1∑nαiyiΦ(xi)−b⋅0+i=1∑nαii=1∑nαi−21(i=1∑nαiyiΦ(xi))Ti=1∑nαiyiΦ(xi)i=1∑nαi−21i,j=1∑nαiαjyiyjΦT(xi)Φ(xj)
下面求
min
w
,
b
L
(
w
,
b
,
α
)
\min _{w, b} L(w, b, \alpha)
minw,bL(w,b,α)对
α
\alpha
α求极大,即
max
α
(
∑
i
=
1
n
α
i
−
1
2
∑
i
,
j
=
1
n
α
i
α
j
y
i
y
j
Φ
T
(
x
i
)
Φ
(
x
j
)
)
s
.
t
.
∑
i
=
1
n
α
i
y
i
=
0
,
i
=
1
,
…
,
n
\begin{aligned} &\max_{\alpha}\left(\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j} \Phi^{T} \left(x_{i}\right) \Phi\left(x_{j}\right)\right)\\ s.t.&\quad\sum_{i=1}^{n} \alpha_{i} y_{i}=0,i=1,\dots,n \end{aligned}
s.t.αmax(i=1∑nαi−21i,j=1∑nαiαjyiyjΦT(xi)Φ(xj))i=1∑nαiyi=0,i=1,…,n
假设
α
∗
=
(
α
1
∗
,
…
,
α
n
∗
)
\bm{\alpha^*}=(\alpha_1^*,\dots,\alpha_n^*)
α∗=(α1∗,…,αn∗)为最大化的解,则带入(2)可得
w
∗
=
∑
i
=
1
N
α
i
∗
y
i
Φ
(
x
i
)
,
\begin{aligned} w^{*} &=\sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \Phi\left(x_{i}\right), \\ \end{aligned}
w∗=i=1∑Nαi∗yiΦ(xi),
将
w
∗
w^*
w∗再带入
(
y
i
(
w
T
⋅
Φ
(
x
i
)
+
b
)
−
1
=
0
(y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)-1=0
(yi(wT⋅Φ(xi)+b)−1=0得
b
∗
=
y
i
−
∑
i
=
1
N
α
i
∗
y
i
(
Φ
(
x
i
)
⋅
Φ
(
x
j
)
)
.
b^{*} =y_{i}-\sum_{i=1}^{N} \alpha_{i}^{*} y_{i}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right).
b∗=yi−i=1∑Nαi∗yi(Φ(xi)⋅Φ(xj)).
求得分离超平面
w
∗
Φ
(
x
)
+
b
∗
=
0
w^*\Phi(x)+b^*=0
w∗Φ(x)+b∗=0,那么分类决策函数即为
f
(
x
)
=
s
i
g
n
(
w
∗
Φ
(
x
)
+
b
∗
)
f(x)=sign(w^*\Phi(x)+b^*)
f(x)=sign(w∗Φ(x)+b∗)。
可以看出只有 α i ∗ > 0 \alpha_i^*>0 αi∗>0对应的样本点 ( x i , y i ) (x_i,y_i) (xi,yi)对决策面起作用,称之为支撑向量。
线性不可分数据集的支持向量机
目标函数
若数据线性不可分,则增加松弛因子 ξ i ≥ 1 \xi_{i}\ge1 ξi≥1,使函数间隔加上松弛变量大于等于1。这样,约束条件变成
y
i
(
w
⋅
Φ
(
x
i
)
+
b
)
≥
1
−
ξ
i
y_{i}\left(w \cdot \Phi(x_{i})+b\right) \geq 1-\xi_{i}
yi(w⋅Φ(xi)+b)≥1−ξi
目标函数改为
min
w
,
b
1
2
∥
w
∥
2
+
C
∑
i
=
1
N
ξ
i
s
.
t
.
y
i
(
w
⋅
Φ
(
x
i
)
+
b
)
≥
1
−
ξ
i
,
i
=
1
,
…
,
n
ξ
i
≥
0
,
i
=
1
,
…
,
n
\begin{aligned} &\quad \min _{w, b} \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i}\\ s.t.&\quad y_{i}\left(w \cdot \Phi(x_{i})+b\right) \geq 1-\xi_{i},i=1,\dots,n\\ &\quad \xi_{i}\ge0,i=1,\dots,n \end{aligned}
s.t.w,bmin21∥w∥2+Ci=1∑Nξiyi(w⋅Φ(xi)+b)≥1−ξi,i=1,…,nξi≥0,i=1,…,n
可以看成是最小化模型误差与核函数系数规模这两部分的和。
显然,C越大,
∑
i
=
1
N
ξ
i
\sum_{i=1}^{N} \xi_{i}
∑i=1Nξi越受关注,就会紧缩,间隔就小,反之越大。
拉格朗日乘子法求解
拉格朗日函数为
L
(
w
,
b
,
ξ
,
α
,
μ
)
=
1
2
∥
w
∥
2
+
C
∑
i
=
1
n
ξ
i
−
∑
i
=
1
n
α
i
(
y
i
(
w
⋅
Φ
(
x
i
)
+
b
)
−
1
+
ξ
i
)
−
∑
i
=
1
n
μ
i
ξ
i
L(w, b, \xi, \alpha, \mu) = \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w \cdot\Phi(x_{i})+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{n} \mu_{i} \xi_{i}
L(w,b,ξ,α,μ)=21∥w∥2+Ci=1∑nξi−i=1∑nαi(yi(w⋅Φ(xi)+b)−1+ξi)−i=1∑nμiξi
求解目标函数的对偶问题即为上式拉格朗日函数的极大极小问题。
首先,对
w
,
b
,
ξ
w,b, \xi
w,b,ξ求偏导 \phi(x_{i})
∂
L
∂
w
=
0
⇒
w
=
∑
i
=
1
n
α
i
y
i
ϕ
(
x
i
)
∂
L
∂
b
=
0
⇒
0
=
∑
i
=
1
n
α
i
y
i
∂
L
∂
ξ
i
=
0
⇒
C
−
α
i
−
μ
i
=
0
(3)
\begin{aligned} \frac{\partial L}{\partial w}=0 &\Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} \phi\left(x_{i}\right) \\ \frac{\partial L}{\partial b}=0 &\Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i} \\ \frac{\partial L}{\partial \xi_{i}}=0& \Rightarrow C-\alpha_{i}-\mu_{i}=0 \end{aligned}\tag{3}
∂w∂L=0∂b∂L=0∂ξi∂L=0⇒w=i=1∑nαiyiϕ(xi)⇒0=i=1∑nαiyi⇒C−αi−μi=0(3)
将上述三式带入拉格朗日函数中,得到
min
w
,
b
,
ξ
L
(
w
,
b
,
ξ
,
α
,
μ
)
=
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
+
∑
i
=
1
n
α
i
\min _{w, b, \xi} L(w, b, \xi, \alpha, \mu)=-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{n} \alpha_{i}
w,b,ξminL(w,b,ξ,α,μ)=−21i=1∑nj=1∑nαiαjyiyj(xi⋅xj)+i=1∑nαi
其次,对上式求关于
α
\alpha
α 的极大,得到:
max
α
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
Φ
(
x
i
)
⋅
Φ
(
x
j
)
)
+
∑
i
=
1
n
α
i
s.t.
∑
i
=
1
n
α
i
y
i
=
0
C
−
α
i
−
μ
i
=
0
α
i
≥
0
μ
i
≥
0
,
i
=
1
,
2
,
…
,
n
\begin{aligned} &\max _{\alpha}-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi(x_{i}) \cdot \Phi(x_{j})\right)+\sum_{i=1}^{n} \alpha_{i} \\ \text { s.t. }& \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \\ \quad &C-\alpha_{i}-\mu_{i}=0 \\ &\begin{array}{l} \alpha_{i} \geq 0 \\ \mu_{i} \geq 0, \quad i=1,2, \ldots, n \end{array} \end{aligned}
s.t. αmax−21i=1∑nj=1∑nαiαjyiyj(Φ(xi)⋅Φ(xj))+i=1∑nαii=1∑nαiyi=0C−αi−μi=0αi≥0μi≥0,i=1,2,…,n
化简可得
max
α
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
Φ
(
x
i
)
⋅
Φ
(
x
j
)
)
+
∑
i
=
1
n
α
i
s.t.
∑
i
=
1
n
α
i
y
i
=
0
0
≤
α
i
≤
C
\begin{aligned} &\max _{\alpha}-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi(x_{i})\cdot \Phi(x_{j})\right)+\sum_{i=1}^{n} \alpha_{i} \\ \text { s.t. }& \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \\ \quad &0\le \alpha_{i}\le C \\ \end{aligned}
s.t. αmax−21i=1∑nj=1∑nαiαjyiyj(Φ(xi)⋅Φ(xj))+i=1∑nαii=1∑nαiyi=00≤αi≤C
假设
α
∗
=
(
α
1
∗
,
…
,
α
n
∗
)
\bm{\alpha^*}=(\alpha_1^*,\dots,\alpha_n^*)
α∗=(α1∗,…,αn∗)为最大化的解,则带入(3)可得
w
∗
=
∑
i
=
1
n
α
i
∗
y
i
Φ
(
x
i
)
b
∗
=
max
i
:
y
i
=
−
1
w
∗
⋅
x
i
+
min
i
:
y
i
=
1
w
∗
⋅
x
i
2
\begin{aligned} w^{*}&=\sum_{i=1}^{n} \alpha_{i}^{*} y_{i} \Phi(x_{i})\\ b^{*}&=\frac{\max _{i: y_{i}=-1} w^{*} \cdot x_{i}+\min _{i: y_{i}=1} w^{*} \cdot x_{i}}{2} \end{aligned}
w∗b∗=i=1∑nαi∗yiΦ(xi)=2maxi:yi=−1w∗⋅xi+mini:yi=1w∗⋅xi
注意:
- 计算
b
∗
b^*
b∗时,需要使用满足条件
0
<
α
j
<
C
0<\alpha_{j}<\mathrm{C}
0<αj<C 的向量。
软间隔的支持问量 x i x_{i} xi 或者在间隔边界上,或者在间隔边界与分离超平面之 间,或者在分离超平面误分一侧:(理由还需弄清楚)
若 α i ∗ < C , \alpha_{i}^{*}<C, αi∗<C, 则 ξ i = 0 , \xi_{i}=0, ξi=0, 支持向量 x i x_{i} xi 恰好落在间 隔边界上;
若 α i ∗ = C , 0 < ξ i < 1 , \alpha_{i}^{*}=C, 0<\xi_{i}<1, αi∗=C,0<ξi<1, 则分类正确, x i x_{i} xi 在间隔边界与分离超平面之间;
若 α i ∗ = C , ξ i = 1 , \alpha_{i}^{*}=C, \quad \xi_{i}=1, αi∗=C,ξi=1, 则 x i x_{i} xi 在分离超平面上;
若 α i ∗ = C , ξ i > 1 , \alpha_{i}^{*}=C, \quad \xi_{i}>1, αi∗=C,ξi>1, 则 x i x_{i} xi 位于分离超平面误分一侧 - 实践中往往取支持向量的所有值取羊均,作为b*
求得分离超平面 w ∗ x + b ∗ = 0 w^{*} x+b^{*}=0 w∗x+b∗=0决策函数为 f ( x ) = sign ( w ∗ x + b ∗ ) f(x)=\operatorname{sign}\left(w^{*} x+b^{*}\right) f(x)=sign(w∗x+b∗)。
核函数
使用核函数,将原始输入空间映射到新的特征空间,从而,使得原本线性不可分的样本可能在核空间可分。
将
κ
(
x
1
,
x
2
)
\kappa\left(x_{1}, x_{2}\right)
κ(x1,x2)带入
Φ
T
(
x
i
)
Φ
(
x
j
)
\Phi^{T}(x_{i}) \Phi(x_{j})
ΦT(xi)Φ(xj)
多项式核函数:
κ
(
x
1
,
x
2
)
=
(
x
1
⋅
x
2
+
c
)
d
\kappa\left(x_{1}, x_{2}\right)=\left(x_{1} \cdot x_{2}+c\right)^{d}
κ(x1,x2)=(x1⋅x2+c)d
高斯核RBF函数:
κ
(
x
1
,
x
2
)
=
exp
(
−
γ
⋅
∥
x
1
−
x
2
∥
2
)
\kappa\left(x_{1}, x_{2}\right)=\exp \left(-\gamma \cdot\left\|x_{1}-x_{2}\right\|^{2}\right)
κ(x1,x2)=exp(−γ⋅∥x1−x2∥2)
Sigmoid核函数:
κ
(
x
1
,
x
2
)
=
tanh
(
x
1
⋅
x
2
+
c
)
\kappa\left(x_{1}, x_{2}\right)=\tanh \left(x_{1} \cdot x_{2}+c\right)
κ(x1,x2)=tanh(x1⋅x2+c)
H
=
(
κ
(
x
i
,
x
j
)
)
n
×
n
H=(\kappa\left(x_{i}, x_{j}\right))_{n\times n}
H=(κ(xi,xj))n×n为正定,是
κ
(
x
i
,
x
j
)
\kappa\left(x_{i}, x_{j}\right)
κ(xi,xj)为核函数的必要条件
(PS:
以多项式核为例
κ
(
x
⃗
,
y
⃗
)
=
(
x
⃗
⋅
y
⃗
+
c
)
2
=
(
x
⃗
⋅
y
⃗
)
2
+
2
c
x
⃗
⋅
y
⃗
+
c
2
=
∑
i
=
1
n
∑
j
=
1
n
(
x
i
x
j
)
(
y
i
y
j
)
+
∑
i
=
1
n
(
2
c
x
i
⋅
2
c
y
i
)
+
c
2
\begin{aligned} \kappa(\vec{x}, \vec{y})&=(\vec{x} \cdot \vec{y}+c)^{2} \\ &=(\vec{x} \cdot \vec{y})^{2}+2 c \vec{x} \cdot \vec{y}+c^{2} \\ &=\sum_{i=1}^{n} \sum_{j=1}^{n}\left(x_{i} x_{j}\right)\left(y_{i} y_{j}\right)+\sum_{i=1}^{n}\left(\sqrt{2 c} x_{i} \cdot \sqrt{2 c} y_{i}\right)+c^{2} \\ \end{aligned}
κ(x,y)=(x⋅y+c)2=(x⋅y)2+2cx⋅y+c2=i=1∑nj=1∑n(xixj)(yiyj)+i=1∑n(2cxi⋅2cyi)+c2
相当于把输入
x
1
×
n
x_{1\times n}
x1×n映射到了
Φ
(
x
⃗
)
1
×
(
n
2
+
n
+
1
)
\Phi(\vec{x})_{1\times(n^2+n+1)}
Φ(x)1×(n2+n+1)
Φ
(
x
⃗
)
=
(
x
1
x
1
,
…
,
x
1
x
n
,
…
,
x
i
x
1
,
…
,
x
i
x
n
,
…
,
x
n
x
1
,
…
,
x
n
x
n
2
c
x
1
,
…
,
2
c
x
i
,
…
,
2
c
x
n
,
c
)
\begin{aligned} \Phi(\vec{x})=&(x_{1} x_{1},\dots,x_{1} x_{n},\dots,x_{i} x_{1},\dots,x_{i} x_{n},\dots,x_{n} x_{1},\dots,x_{n} x_{n}\\ &\sqrt{2 c} x_{1},\dots,\sqrt{2 c} x_{i},\dots,\sqrt{2 c} x_{n}, c) \end{aligned}
Φ(x)=(x1x1,…,x1xn,…,xix1,…,xixn,…,xnx1,…,xnxn2cx1,…,2cxi,…,2cxn,c)
再对
Φ
(
x
⃗
)
\Phi(\vec{x})
Φ(x)和
Φ
(
y
⃗
)
\Phi(\vec{y})
Φ(y)求内积。
以高斯核为例,则
κ
(
x
1
,
x
2
)
=
e
−
∥
x
1
−
x
2
∥
2
2
σ
2
=
e
−
(
x
1
−
x
2
)
2
2
σ
2
=
e
−
x
1
2
+
x
2
2
−
2
x
1
x
2
2
σ
2
=
e
−
x
1
2
+
x
2
2
2
σ
2
⋅
e
x
1
x
2
σ
2
=
e
−
x
1
2
+
x
2
2
2
σ
2
⋅
(
1
+
1
σ
2
⋅
x
1
x
2
1
!
+
(
1
σ
2
)
2
⋅
(
x
1
x
2
)
2
2
!
+
(
1
σ
2
)
3
⋅
(
x
1
x
2
)
3
3
!
+
⋯
+
(
1
σ
2
)
n
⋅
(
x
1
x
2
)
n
n
!
+
⋯
)
=
e
−
x
1
2
+
x
2
2
2
σ
2
⋅
(
1
⋅
1
+
1
1
!
x
1
σ
⋅
x
2
σ
+
1
2
!
⋅
x
1
2
σ
2
⋅
x
2
2
σ
2
+
1
3
!
⋅
x
1
3
σ
3
⋅
x
2
3
σ
3
+
⋯
+
1
n
!
⋅
x
1
n
σ
n
⋅
x
2
n
σ
n
+
⋯
)
=
Φ
(
x
1
)
T
⋅
Φ
(
x
2
)
\begin{array}{l} \kappa\left(x_{1}, x_{2}\right)=e^{-\frac{\left\|x_{1}-x_{2}\right\|^{2}}{2 \sigma^{2}}}=e^{-\frac{\left(x_{1}-x_{2}\right)^{2}}{2 \sigma^{2}}}=e^{-\frac{x_{1}^{2}+x_{2}^{2}-2 x_{1} x_{2}}{2 \sigma^{2}}}=e^{-\frac{x_{1}^{2}+x_{2}^{2}}{2 \sigma^{2}}} \cdot e^{\frac{x_{1} x_{2}}{\sigma^{2}}} \\ =e^{-\frac{x_{1}^{2}+x_{2}^{2}}{2 \sigma^{2}}} \cdot\left(1+\frac{1}{\sigma^{2}} \cdot \frac{x_{1} x_{2}}{1 !}+\left(\frac{1}{\sigma^{2}}\right)^{2} \cdot \frac{\left(x_{1} x_{2}\right)^{2}}{2 !}+\left(\frac{1}{\sigma^{2}}\right)^{3} \cdot \frac{\left(x_{1} x_{2}\right)^{3}}{3 !}+\cdots+\left(\frac{1}{\sigma^{2}}\right)^{n} \cdot \frac{\left(x_{1} x_{2}\right)^{n}}{n !}+\cdots\right) \\ =e^{-\frac{x_{1}^{2}+x_{2}^{2}}{2 \sigma^{2}}} \cdot\left(1 \cdot 1+\frac{1}{1 !} \frac{x_{1}}{\sigma} \cdot \frac{x_{2}}{\sigma}+\frac{1}{2 !} \cdot \frac{x_{1}^{2}}{\sigma^{2}} \cdot \frac{x_{2}^{2}}{\sigma^{2}}+\frac{1}{3 !} \cdot \frac{x_{1}^{3}}{\sigma^{3}} \cdot \frac{x_{2}^{3}}{\sigma^{3}}+\cdots+\frac{1}{n !} \cdot \frac{x_{1}^{n}}{\sigma^{n}} \cdot \frac{x_{2}^{n}}{\sigma^{n}}+\cdots\right) \\ =\Phi\left(x_{1}\right)^{T} \cdot \Phi\left(x_{2}\right) \\ \end{array}
κ(x1,x2)=e−2σ2∥x1−x2∥2=e−2σ2(x1−x2)2=e−2σ2x12+x22−2x1x2=e−2σ2x12+x22⋅eσ2x1x2=e−2σ2x12+x22⋅(1+σ21⋅1!x1x2+(σ21)2⋅2!(x1x2)2+(σ21)3⋅3!(x1x2)3+⋯+(σ21)n⋅n!(x1x2)n+⋯)=e−2σ2x12+x22⋅(1⋅1+1!1σx1⋅σx2+2!1⋅σ2x12⋅σ2x22+3!1⋅σ3x13⋅σ3x23+⋯+n!1⋅σnx1n⋅σnx2n+⋯)=Φ(x1)T⋅Φ(x2)
相当于把输入
x
1
×
n
x_{1\times n}
x1×n映射到了
Φ
(
x
⃗
)
1
×
∞
\Phi(\vec{x})_{1\times\infty}
Φ(x)1×∞空间
Φ
(
x
)
=
e
−
x
2
2
σ
2
⋅
(
1
,
1
1
!
x
σ
+
1
2
!
x
2
σ
2
,
…
,
1
n
!
x
n
σ
n
,
⋯
)
\Phi(x)=e^{-\frac{x^{2}}{2 \sigma^{2}}} \cdot\left(1 ,\sqrt{\frac{1}{1 !}}\frac{x}{\sigma} + \sqrt{\frac{1}{2 !} } \frac{x^{2}}{\sigma^{2}} ,\dots,\sqrt{\frac{1}{n !}} \frac{x^{n}}{\sigma^{n}} ,\cdots\right) \\
Φ(x)=e−2σ2x2⋅(1,1!1σx+2!1σ2x2,…,n!1σnxn,⋯)
再对
Φ
(
x
⃗
)
\Phi(\vec{x})
Φ(x)和
Φ
(
y
⃗
)
\Phi(\vec{y})
Φ(y)求内积。核函数只要好好调参,一定能成。
)