Machine Learning 06 - Support Vector Machine

本文深入探讨了支持向量机(SVM)的概念及其在大型边界分类中的应用,并介绍了两种常用的核函数:多项式核和高斯核。通过实例说明如何选择地标点、定义核函数并进行训练与评估。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

正在学习Stanford吴恩达的机器学习课程,常做笔记,以便复习巩固。
鄙人才疏学浅,如有错漏与想法,还请多包涵,指点迷津。

6.1 Large Margin Classification

6.1.1 Optimizaiton objective

Here we intorduce the last supervised algorithm : Support Vector Machine.

Hypothesis :

hθ(x)={10ifθTx0otherwisehθ(x)={1ifθTx≥00otherwise

Cost function :

minθ Ci=1m[y(i)cost1(θTx(i))+(1y(i))cost0(θTx(i))]+12i=1nθ2jminθ C∑i=1m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+12∑i=1nθj2

where cost1cost1 is the cost when y=1y=1 and cost0cost0 is the cost when y=0y=0. An intuitive explanation is below :

cost function

Decision boundary :

decision boundary

SVM will find a line that has the largest margin between the data. And the regularized term CC is intuitively show below :

regularized term

6.1.2 Concept of kernels

In this part, in order to fit Non-linear decision boundary, we will adapt the hypothesis function to

hθ(x)={1θ0+θ1f1+θ2f200 otherwise 

(1) Polynomial

fi=xki(i,j=1,2,)fi=xik(i,j=1,2,⋯)

It can fit dataset very well, but we don’t know which features to add and it is very computationally expensive.

(2) Gaussian Kernel
First, choose some landmarks l(i)( i=1,2,)l(i)( i=1,2,⋯)

Second, define fi( i=1,2,)fi( i=1,2,⋯), such as Gaussian Kernel :

fi=expxl(i)22σ2=sim(x,l(i))fi=exp(−‖x−l(i)‖22σ2)=sim(x,l(i))

It mesures the similarity of two points :
  • If xl(i):fi1x≈l(i):fi≈1,
  • If xx is far from l(i) : fi0fi≈0.

And the σσ just like a scale of the distance of two points :

example

Finally, what it perdicet (for example) is :

example

6.1.3 SVM with kernels

(1) Choose landmarks

Given (x(1),y(1)),(x(2),y(2)),,(x(m),y(m))(x(1),y(1)),(x(2),y(2)),⋯,(x(m),y(m)) ,and choose l(1)=x(1),l(2)=x(2),,l(m)=x(m)l(1)=x(1),l(2)=x(2),⋯,l(m)=x(m) .

(2) Define kernels

We define ff as Gaussian Kernel :

f(i)=[f0(i)f1(i)fm(i)]=[sim(x(i),l(1))sim(x(i),l(2))sim(x(i),l(m))],i=1,2,,m

(3) Training

minθ Ci=1m[y(i)cost1(θTf(i))+(1y(i))cost0(θTf(i))]+12j=1mθ2jminθ C∑i=1m[y(i)cost1(θTf(i))+(1−y(i))cost0(θTf(i))]+12∑j=1mθj2

Use minimization algorirhm to solve it.

(4) Evaluation

  • Large CC:Lower bias, higher variance.
  • Small C:Higher bias, lower variance.
  • Large σ2σ2: Higher bias, lower variance. (ff is more “smooth”)
  • Small σ2: Lower bias, higher variance.

(5) Note

  • Perform feature scaling before using the Gaussian Kernel .
  • Not all similarity functions make valid kernals. (Need to satisfy “Mercer Theorem” to make sure SVM packages run correctly)
  • Other kernels : Polynomial kernel, String kernel, …
  • Muti-class classification : one-vs-all method.
  • If nmn≫m, use logistic regression or SVM without kernel; if nn is samll, m is intermediate, use SVM with kernel; if mnm≫n, create more features, and turn to case one. Neural network likely to work well for most of these things.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值