SVM核函数

Kernel Functions

Below is a list of some kernel functions available from the existing literature. As was the case with previous articles, every LaTeX notation for the formulas below are readily available from their alternate text html tag. I can not guarantee all of them are perfectly correct, thus use them at your own risk. Most of them have links to articles where they have been originally used or proposed.

1. Linear Kernel

The Linear kernel is the simplest kernel function. It is given by the inner product <x,y> plus an optional constant c. Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts, i.e. KPCA with linear kernel is the same as standard PCA.

k(x, y) = x^T y + c

2. Polynomial Kernel

The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data is normalized.

k(x, y) = (\alpha x^T y + c)^d
Adjustable parameters are the slope alpha, the constant term c and the polynomial degree d.

3. Gaussian Kernel

The Gaussian kernel is an example of radial basis function kernel.

k(x, y) = \exp\left(-\frac{ \lVert x-y \rVert ^2}{2\sigma^2}\right)

Alternatively, it could also be implemented using

k(x, y) = \exp\left(- \gamma \lVert x-y \rVert ^2 )

The adjustable parameter sigma plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.

4. Exponential Kernel

The exponential kernel is closely related to the Gaussian kernel, with only the square of the norm left out. It is also a radial basis function kernel.

k(x, y) = \exp\left(-\frac{ \lVert x-y \rVert }{2\sigma^2}\right)

5. Laplacian Kernel

The Laplace Kernel is completely equivalent to the exponential kernel, except for being less sensitive for changes in the sigma parameter. Being equivalent, it is also a radial basis function kernel.

k(x, y) = \exp\left(- \frac{\lVert x-y \rVert }{\sigma}\right)

It is important to note that the observations made about the sigma parameter for the Gaussian kernel also apply to the Exponential and Laplacian kernels.

6. ANOVA Kernel

The ANOVA kernel is also a radial basis function kernel, just as the Gaussian and Laplacian kernels. It is said to perform well in multidimensional regression problems (Hofmann, 2008).

k(x, y) =  \sum_{k=1}^n  \exp (-\sigma (x^k - y^k)^2)^d

7. Hyperbolic Tangent (Sigmoid) Kernel

The Hyperbolic Tangent Kernel is also known as the Sigmoid Kernel and as the Multilayer Perceptron (MLP) kernel. The Sigmoid Kernel comes from the Neural Networks field, where the bipolar sigmoid function is often used as an activation function for artificial neurons.

k(x, y) = \tanh (\alpha x^T y + c)

It is interesting to note that a SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network. This kernel was quite popular for support vector machines due to its origin from neural network theory. Also, despite being only conditionally positive definite, it has been found to perform well in practice.

There are two adjustable parameters in the sigmoid kernel, the slope alpha and the intercept constant c. A common value for alpha is 1/N, where N is the data dimension. A more detailed study on sigmoid kernels can be found in the works by Hsuan-Tien and Chih-Jen.

8. Rational Quadratic Kernel

The Rational Quadratic kernel is less computationally intensive than the Gaussian kernel and can be used as an alternative when using the Gaussian becomes too expensive.

k(x, y) = 1 - \frac{\lVert x-y \rVert^2}{\lVert x-y \rVert^2 + c}

9. Multiquadric Kernel

The Multiquadric kernel can be used in the same situations as the Rational Quadratic kernel. As is the case with the Sigmoid kernel, it is also an example of an non-positive definite kernel.

k(x, y) = \sqrt{\lVert x-y \rVert^2 + c^2}

10. Inverse Multiquadric Kernel

The Inverse Multi Quadric kernel. As with the Gaussian kernel, it results in a kernel matrix with full rank (Micchelli, 1986) and thus forms a infinite dimension feature space.

k(x, y) = \frac{1}{\sqrt{\lVert x-y \rVert^2 + \theta^2}}

11. Circular Kernel

The circular kernel comes from a statistics perspective. It is an example of an isotropic stationary kernel and is positive definite in R2.

k(x, y) = \frac{2}{\pi} \arccos ( - \frac{ \lVert x-y \rVert}{\sigma}) - \frac{2}{\pi} \frac{ \lVert x-y \rVert}{\sigma} \sqrt{1 - \left(\frac{ \lVert x-y \rVert^2}{\sigma} \right)}
\mbox{if}~ \lVert x-y \rVert < \sigma \mbox{, zero otherwise}

12. Spherical Kernel

The spherical kernel is similar to the circular kernel, but is positive definite in R3.

k(x, y) = 1 - \frac{3}{2} \frac{\lVert x-y \rVert}{\sigma} + \frac{1}{2} \left( \frac{ \lVert x-y \rVert}{\sigma} \right)^3

\mbox{if}~ \lVert x-y \rVert < \sigma \mbox{, zero otherwise}

13. Wave Kernel

The Wave kernel is also symmetric positive semi-definite (Huang, 2008).

k(x, y) = \frac{\theta}{\lVert x-y \rVert \right} \sin \frac{\lVert x-y \rVert }{\theta}

14. Power Kernel

The Power kernel is also known as the (unrectified) triangular kernel. It is an example of scale-invariant kernel (Sahbi and Fleuret, 2004) and is also only conditionally positive definite.

k(x,y) = - \lVert x-y \rVert ^d

15. Log Kernel

The Log kernel seems to be particularly interesting for images, but is only conditionally positive definite.

k(x,y) = - log (\lVert x-y \rVert ^d + 1)

16. Spline Kernel

The Spline kernel is given as a piece-wise cubic polynomial, as derived in the works by Gunn (1998).

k(x, y) = 1 + xy + xy~min(x,y) - \frac{x+y}{2}~min(x,y)^2+\frac{1}{3}\min(x,y)^3

However, what it actually mean is:

k(x,y) = \prod_{i=1}^d 1 + x_i y_i + x_i y_i \min(x_i, y_i) - \frac{x_i + y_i}{2} \min(x_i,y_i)^2 + \frac{\min(x_i,y_i)^3}{3}

Withx,y \in R^d

17. B-Spline (Radial Basis Function) Kernel

The B-Spline kernel is defined on the interval [−1, 1]. It is given by the recursive formula:

k(x,y) = B_{2p+1}(x-y)

\mbox{where~} p \in N \mbox{~with~} B_{i+1} := B_i \otimes  B_0.

In the work by Bart Hamers it is given by:k(x, y) = \prod_{p=1}^d B_{2n+1}(x_p - y_p)

Alternatively, Bn can be computed using the explicit expression (Fomel, 2000):

B_n(x) = \frac{1}{n!} \sum_{k=0}^{n+1} \binom{n+1}{k} (-1)^k (x + \frac{n+1}{2} - k)^n_+

Where x+ is defined as the truncated power function:

x^d_+ = \begin{cases} x^d, & \mbox{if }x > 0 \\  0, & \mbox{otherwise} \end{cases}

18. Bessel Kernel

The Bessel kernel is well known in the theory of function spaces of fractional smoothness. It is given by:

k(x, y) = \frac{J_{v+1}( \sigma \lVert x-y \rVert)}{ \lVert x-y \rVert ^ {-n(v+1)} }

where J is the Bessel function of first kind. However, in the Kernlab for R documentation, the Bessel kernel is said to be:

k(x,x') = - Bessel_{(nu+1)}^n (\sigma |x - x'|^2)

19. Cauchy Kernel

The Cauchy kernel comes from the Cauchy distribution (Basak, 2008). It is a long-tailed kernel and can be used to give long-range influence and sensitivity over the high dimension space.

k(x, y) = \frac{1}{1 + \frac{\lVert x-y \rVert^2}{\sigma} }

20. Chi-Square Kernel

The Chi-Square kernel comes from the Chi-Square distribution.

k(x,y) = 1 - \sum_{i=1}^n \frac{(x_i-y_i)^2}{\frac{1}{2}(x_i+y_i)}

21. Histogram Intersection Kernel

The Histogram Intersection Kernel is also known as the Min Kernel and has been proven useful in image classification.

k(x,y) = \sum_{i=1}^n \min(x_i,y_i)

22. Generalized Histogram Intersection

The Generalized Histogram Intersection kernel is built based on the Histogram Intersection Kernel for image classification but applies in a much larger variety of contexts (Boughorbel, 2005). It is given by:

k(x,y) = \sum_{i=1}^m \min(|x_i|^\alpha,|y_i|^\beta)

23. Generalized T-Student Kernel

The Generalized T-Student Kernel has been proven to be a Mercel Kernel, thus having a positive semi-definite Kernel matrix (Boughorbel, 2004). It is given by:

k(x,y) = \frac{1}{1 + \lVert x-y \rVert ^d}

24. Bayesian Kernel

The Bayesian kernel could be given as:

k(x,y) = \prod_{l=1}^N \kappa_l (x_l,y_l)

where

\kappa_l(a,b) = \sum_{c \in \{0;1\}} P(Y=c \mid X_l=a) ~ P(Y=c \mid X_l=b)

However, it really depends on the problem being modeled. For more information, please see the work by Alashwal, Deris and Othman, in which they used a SVM with Bayesian kernels in the prediction of protein-protein interactions.

25. Wavelet Kernel

The Wavelet kernel (Zhang et al, 2004) comes from Wavelet theory and is given as:

k(x,y) = \prod_{i=1}^N h(\frac{x_i-c_i}{a}) \:  h(\frac{y_i-c_i}{a})

Where a and c are the wavelet dilation and translation coefficients, respectively (the form presented above is a simplification, please see the original paper for details). A translation-invariant version of this kernel can be given as:

k(x,y) = \prod_{i=1}^N h(\frac{x_i-y_i}{a})

Where in both h(x) denotes a mother wavelet function. In the paper by Li Zhang, Weida Zhou, and Licheng Jiao, the authors suggests a possible h(x) as:

h(x) = cos(1.75x)exp(-\frac{x^2}{2})

Which they also prove as an admissible kernel function.


本文章转载自:http://www.shamoxia.com/html/y2010/2292.html

### 支持向量机中的核函数解释与实现 #### 核函数的作用 支持向量机(Support Vector Machine, SVM)是一种强大的监督学习算法,广泛应用于分类和回归任务。当数据不是线性可分时,SVM通过引入核函数将输入空间映射到高维特征空间,在该空间中寻找最优分离超平面[^1]。 常见的核函数包括多项式核、径向基函数(RBF)、sigmoid核以及直方图交集核等。这些核函数能够有效地处理不同类型的非线性关系。 #### 常见的核函数及其定义 1. **线性核函数** 线性核是最简单的核函数形式,适用于数据本身已经接近于线性可分的情况。 \[ K(x_i, x_j) = x_i^\top x_j \] 2. **多项式核函数** 多项式核可以捕捉更高阶的关系,其表达式如下所示: \[ K(x_i, x_j) = (\gamma x_i^\top x_j + r)^d \] 其中 \( \gamma > 0 \),\( d \) 是多项式的次数,而 \( r \) 则是一个常数偏移参数。 3. **径向基函数(RBF)** RBF核是实际应用中最常用的核之一,尤其适合解决复杂的非线性问题。它的数学表示为: \[ K(x_i, x_j) = \exp(-\gamma ||x_i - x_j||^2) \] 这里 \( \gamma > 0 \) 控制着决策边界的平滑程度。 4. **Sigmoid核函数** Sigmoid核类似于神经网络中的激活函数,具有以下形式: \[ K(x_i, x_j) = \tanh(\gamma x_i^\top x_j + r) \] 它有时被用于构建多层感知器模型。 5. **直方图交集核** 对图像识别等领域特别有用的一种特殊核——直方图交集核,计算两个样本之间的相似度基于它们各自属性值重叠部分的比例: \[ K(x_i, x_j) = \sum_{k=1}^{n}\min(x_{ik}, x_{jk}) \] #### 使用 LIBSVM 实现 SVM 的例子 下面展示如何利用 `LIBSVM` 库来训练带有特定核函数的支持向量机: ```python from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split import svmutil as libsvm # 创建模拟数据集 X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=1) # 将数据划分为训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 转换为libsvm所需的格式 def convert_to_libsvm_format(data, labels): problem = libsvm.svm_problem(labels, data.tolist()) return problem problem = convert_to_libsvm_format(X_train, y_train) # 设置不同的核选项并训练模型 param_rbf = libsvm.svm_parameter('-t 2 -c 1 -g 0.1') # 使用RBF核 model_rbf = libsvm.svm_train(problem, param_rbf) param_poly = libsvm.svm_parameter('-t 1 -c 1 -g 0.1 -r 0 -d 3') # 使用多项式核 model_poly = libsvm.svm_train(problem, param_poly) ``` 上述代码片段展示了两种常见核的选择方式:RBF 和多项式核,并说明了如何调整相应的超参数以优化性能[^3]。 #### 总结 选择合适的核函数对于提升 SVM 模型的表现至关重要。每种核都有自己的适用场景,因此理解各种核的特点有助于更好地应对具体的应用需求。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值