##svm##
svm和逻辑回归一样,也是用来学习得到一个决策边界(decision bundary)的,只不过在某些情况下比逻辑回归更加有效。
###1.引子-逻辑回归###
hθ(x)=11+exp(−θTX)h_{\theta}(x) = \frac{1}{1+\exp(-\theta^TX)}hθ(x)=1+exp(−θTX)1
对于该假设:
- if y = 1,then hθ(x)=1h_{\theta}(x) = 1hθ(x)=1,θT\theta^TθTx>>0
- if y = 0,then hθ(x)=0h_{\theta}(x)=0hθ(x)=0 ,θT\theta^TθTx<<0
cost function is: - cost=−ylog11+exp(−z)−(1−y)log(1−11+exp(−z))cost = -ylog{\frac{1}{1+exp(-z)}}-(1-y)log(1-\frac{1}{1+exp(-z)})cost=−ylog1+exp(−z)1−(1−y)log(1−1+exp(−z)1)
对cost进行优化:
minθ1m[∑i=1my(i)(−loghθ(x(i))+(1−y(i))(−log(1−hθ(x(i)))))]+λ2m∑j=1nθj2min_{\theta} \frac{1}{m}[\sum_{i=1}^{m} y^{(i)}(-\log h_\theta (x^{(i)})+(1-y^{(i)})(-\log (1-h_\theta (x^{(i)}))))] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2minθm1[∑i=1my(i)(−loghθ(x(i))+(1−y(i))(−log(1−hθ(x(i)))))]+2mλ∑j=1nθj2
###2. svm###
在svm中,去除1m\frac{1}{m}m1这一项(仅是为了计算方便),设:
cost1(θTx(i))cost_1(\theta^Tx^{(i)})cost1(θTx(i)) = y(i)(−loghθ(x(i)))y^{(i)}(-\log h_{\theta}(x^{(i)}))y(i)(−loghθ(x(i)))
cost0(θTx(i))cost_0(\theta^Tx^{(i)})cost0(θTx(i)) = (1−y(i))(−log(1−hθ(x(i))))(1-y^{(i)})(-\log (1-h_{\theta}(x^{(i)})))(1−y(i))(−log(1−hθ(x(i))))
则优化目标变为:
minθ∑i=1m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+λ2∑j=1nθj2min_{\theta} \sum_{i=1}^{m} [y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)})] + \frac{\lambda}{2}\sum_{j=1}^n \theta_j^2minθ∑i=1m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+2λ∑j=1nθj2
在逻辑回归中,A+λBA+\lambda BA+λB:λ\lambdaλ越大,则赋予B更大的权重,相对B对该式影响越小,所以增大λ\lambdaλ有利于调整B对公式的计算结果的影响.
在svm中,CA+BCA + BCA+B;C越小,则赋予B更大的权重,效果与逻辑回归中一样。所以,可以将C设为1λ\frac{1}{\lambda}λ1。则,优化目标可以修改为:
minθC∑i=1m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+12∑j=1nθj2min_{\theta} C\sum_{i=1}^{m} [y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)})] + \frac{1}{2}\sum_{j=1}^n \theta_j^2minθCi=1∑m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+21j=1∑nθj2
该公式则为svm的优化目标。
令z=θTx(i)z=\theta^Tx^{(i)}z=θTx(i)
如果y=1,希望:cost1(z)cost_1(z)cost1(z)是当z>=1时,cost1(z)=0cost_1(z)=0cost1(z)=0
如果y=0,希望:cost0(z)cost_0(z)cost0(z)是当z<=-1时,cost0(z)=0cost_0(z)=0cost0(z)=0
如果C很大,则希望找到使得y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)})y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))整体为零的最优解。即:
y(i)=1:θTx(i)>=1y^{(i)} = 1:\theta^Tx^{(i)}>=1y(i)=1:θTx(i)>=1
y(i)=0:θTx(i)<=−1y^{(i)} = 0:\theta ^Tx^{(i)}<=-1y(i)=0:θTx(i)<=−1
则:
minθC∗0+12∑j=1nθj2min_{\theta} C*0 + \frac{1}{2}\sum_{j=1}^{n}\theta_j^2minθC∗0+21∑j=1nθj2
s.t.z>=1,如果y(i)=1;z<=−1,如果y(i)=0s.t. z>=1, 如果y^{(i)}=1;z<=-1,如果y^{(i)}=0s.t.z>=1,如果y(i)=1;z<=−1,如果y(i)=0
###3.决策边界###