Machine Learning Techniques -5-Kernel Logistic Regression

5-Kernel Logistic Regression

Last class, we learnt about soft margin and its application. Now, a new idea comes to us, could we

apply the kernel trick to our old frirend logistic regression? 

Firstly, let's review those four concepts of margin handling: 

 As we can see, the differences between "Hard" and "Soft" is showed from constant C, which is a bit similar to Regularization.

Since we define a new factor called ξ, we can use MAX function smoothly express the margin violation:

thus the unconstrained form of soft-margin SVM:

We can easily find that the form of this subject is similar to L2 regularization.

However, there is no QP formation and the function of max may lead to some place not differentiable.

Thus we get the idea, which apply SVM as a regularization model:

For the REGULARIZATION FORM SVM, a larger C means the smaller influence from wTw, which is the regularization factor.

Then, a comparition about error will be given,

the SVM error has different appearance with the middle point, which we called hinge error measure.

Now for this binary Classification, could LogReg and SVM be joint?

Because we know the advantage of SVM, which is able to simplify the computing by kernel, while the LogReg holds some other benefits.

Here we apply the Platt's Scaling https://en.wikipedia.org/wiki/Platt_scaling

which is found to be a nice method to better the binary problem.

We caculate the transforming of SVM to get the w and b, and we other tool to find best A and B.

In conclusion, the structure of our demand is like that :

we want to use KERNEL ->   we need wT*Z (to package into KERNEL)  -> we need linear combination of Zn

 

optimal w be represented by zn:

Since the w|| can be prpved to be the only subitem in w:

It can be proved that any L2- regularized linear model can be kernelized.

So, here we get a new represention called Kernel Logistic Regression (KLR),

There are something we should pay attention:

1. the dimention of this issue is subject to  N of samples.

2. the β can be seen as a description toward the relationship between xn and any other points in X space.

3. βn could not be zero, which means larger computing cost compared to the process of finding good w.

 

转载于:https://www.cnblogs.com/windniu/p/4759951.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值