Matlab SVM

Matlab SVM

最近项目要用到SVM,时间紧,所以就直接用Matlab提供的库函数。另外,这个最负盛名的libsvm,台湾林智仁教授开发的开源包,http://www.csie.ntu.edu.tw/~cjlin/libsvm/业界很有名的一个包,有各种各样的接口,现在最新更新到Python

 

Matlab中SVM的函数主要有两个:

%svmtrain:

svmStruct= svmtrain(training,goups) %读入训练样本和标号,得到一个结构体类型的svmStruct

svmStruct=svmtrain(data(train,:),groups(train),’Kernel_Function’,'rbf’,'Kernel_FunctionValue’,’5′,’showplot’,true);  %用了核宽为5的径向基核,且有可视化的输出

%svmclassify

classes = svmclassify(svmStruct,data(test,:),’showplot’,true); %测试样本分类

 

 

SVM特征向量归一化,归一化可以规避不同变量之间的量纲差异。

%dataset归一化保存到dataset_scale

ymin =min(min(dataset));

ymax =max(max(dataset));

dataset_scale= mapminmax(dataset, ymin, ymax);%归一化到ymin和ymax之间

dataset_scale= mapminmax(dataset,0,1);   %归一化到0和1之间

但是,实验之后,归一化的效果还不如不归一化。

Matlab数据的归一化

归一化的具体作用是归纳统一样本的统计分布性。归一化在0-1之间是统计的概率分布,归一化在-1--+1之间是统计的坐标分布。归一化有同一、统一和合一的意思。无论是为了建模还是为了计算,首先基本度量单位要同一,神经网络是以样本在事件中的统计分别几率来进行训练(概率计算)和预测的,且sigmoid函数的取值是01之间的,网络最后一个节点的输出也是如此,所以经常要对样本的输出归一化处理。归一化是统一在0-1之间的统计概率分布,当所有样本的输入信号都为正值时,与第一隐含层神经元相连的权值只能同时增加或减小,从而导致学习速度很慢。另外在数据中常存在奇异样本数据,奇异样本数据存在所引起的网络训练时间增加,并可能引起网络无法收敛。为了避免出现这种情况及后面数据处理的方便,加快网络学习速度,可以对输入信号进行归一化,使得所有样本的输入信号其均值接近于0或与其均方差相比很小。

 

matlab里面,用于归一化的方法共有三种:

1)用matlab语言自己编程,通常使用的函数有以下几种:

1.线性函数转换,表达式如下:

y=(x-MinValue)/(MaxValue-MinValue) (归一到0 1之间)

y=0.1+(x-min)/(max-min)*(0.9-0.1)(归一到0.1-0.9之间)

说明:xy分别为转换前、后的值,MaxValueMinValue分别为样本的最大值和最小值。

2.对数函数转换,表达式如下:

y=log10(x)

说明:以10为底的对数函数转换。

3.反余切函数转换,表达式如下:

y=atan(x)*2/PI

 

2premnmxtramnmxpostmnmxmapminmax

premnmx函数用于将网络的输入数据或输出数据进行归一化,归一化后的数据将分布在[-1,1]区间内。

premnmx语句的语法格式是:[Pn,minp,maxp,Tn,mint,maxt]=premnmx(P,T),其中PT分别为原始输入和输出数据。

在训练网络时如果所用的是经过归一化的样本数据,那么以后使用网络时所用的新数据也应该和样本数据接受相同的预处理,这就要用到tramnmx函数:

tramnmx语句的语法格式是:[PN]=tramnmx(P,minp,maxp)

其中PPN分别为变换前、后的输入数据,maxpminp分别为premnmx函数找到的最大值和最小值。

网络输出结果需要进行反归一化还原成原始的数据,常用的函数是:postmnmx

postmnmx语句的语法格式是:[PN] = postmnmx(P,minp,maxp)

其中PPN分别为变换前、后的输入数据,maxpminp分别为premnmx函数找到的最大值和最小值。

还有一个函数是mapminmax,该函数可以把矩阵的每一行归一到[-1 1].

mapminmax语句的语法格式是:[y1,PS] = mapminmax(x1)

其中x1 是需要归一的矩阵 y1是结果。

当需要对另外一组数据做归一时,就可以用下面的方法做相同的归一了

y2 = mapminmax('apply',x2,PS)

当需要把归一的数据还原时,可以用以下命令:

x1_again = mapminmax('reverse',y1,PS) 

 

3prestdpoststdtrastd

prestd归一到单位方差和零均值。

pminpmaxp分别为P中的最小值和最大值。mintmaxt分别为T的最小值和最大值。


Matlab交叉验证

1).Hold-Out Method
将原始数据随机分为两组,一组做为训练集,一组做为验证集,利用训练集训练分类器,然后利用验证集验证模型,记录最后的分类准确率为此Hold-OutMethod下分类器的性能指标.此种方法的好处的处理简单,只需随机把原始数据分为两组即可,其实严格意义来说Hold-Out Method并不能算是CV,因为这种方法没有达到交叉的思想,由于是随机的将原始数据分组,所以最后验证集分类准确率的高低与原始数据的分组有很大的关系,所以这种方法得到的结果其实并不具有说服性.

2).K-fold Cross Validation(记为K-CV)
将原始数据分成K(一般是均分),将每个子集数据分别做一次验证集,其余的K-1组子集数据作为训练集,这样会得到K个模型,用这K个模型最终的验证集的分类准确率的平均数作为此K-CV下分类器的性能指标.K一般大于等于2,实际操作时一般从3开始取,只有在原始数据集合数据量小的时候才会尝试取2.K-CV可以有效的避免过学习以及欠学习状态的发生,最后得到的结果也比较具有说服性.

3).Leave-One-Out Cross Validation(记为LOO-CV)
如果设原始数据有N个样本,那么LOO-CV就是N-CV,即每个样本单独作为验证集,其余的N-1个样本作为训练集,所以LOO-CV会得到N个模型,用这N个模型最终的验证集的分类准确率的平均数作为此下LOO-CV分类器的性能指标.相比于前面的K-CV,LOO-CV有两个明显的优点:
a.
每一回合中几乎所有的样本皆用于训练模型,因此最接近原始样本的分布,这样评估所得的结果比较可靠。
b.
实验过程中没有随机因素会影响实验数据,确保实验过程是可以被复制的。

LOO-CV的缺点则是计算成本高,因为需要建立的模型数量与原始数据样本数量相同,当原始数据样本数量相当多时,LOO-CV在实作上便有困难几乎就是不显示,除非每次训练分类器得到模型的速度很快,或是可以用并行化计算减少计算所需的时间.


转载自:http://blog.youkuaiyun.com/angelahhj/article/details/41849717

need to conduct installation. If you have modified the sources and would like to re-build the package, type 'mex -setup' in MATLAB to choose a compiler for mex first. Then type 'make' to start the installation. Starting from MATLAB 7.1 (R14SP3), the default MEX file extension is changed from .dll to .mexw32 or .mexw64 (depends on 32-bit or 64-bit Windows). If your MATLAB is older than 7.1, you have to build these files yourself. Example: matlab> mex -setup (ps: MATLAB will show the following messages to setup default compiler.) Please choose your compiler for building external interface (MEX) files: Would you like mex to locate installed compilers [y]/n? y Select a compiler: [1] Microsoft Visual C/C++ version 7.1 in C:\Program Files\Microsoft Visual Studio [0] None Compiler: 1 Please verify your choices: Compiler: Microsoft Visual C/C++ 7.1 Location: C:\Program Files\Microsoft Visual Studio Are these correct?([y]/n): y matlab> make Under 64-bit Windows, Visual Studio 2005 user will need "X64 Compiler and Tools". The package won't be installed by default, but you can find it in customized installation options. For list of supported/compatible compilers for MATLAB, please check the following page: http://www.mathworks.com/support/compilers/current_release/ Usage ===== matlab> model = svmtrain(training_label_vector, training_instance_matrix [, 'libsvm_options']); -training_label_vector: An m by 1 vector of training labels (type must be double). -training_instance_matrix: An m by n matrix of m training instances with n features. It can be dense or sparse (type must be double). -libsvm_options: A string of training options in the same format as that of LIBSVM. matlab> [predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model [, 'libsvm_options']); -testing_label_vector: An m by 1 vector of prediction labels. If labels of test data are unknown, simply use any random values. (type must be double) -testing_instance_matrix: An m by n matrix of m testing instances with n features. It can be dense or sparse. (type must be double) -model: The output of svmtrain. -libsvm_options: A string of testing options in the same format as that of LIBSVM. Returned Model Structure ======================== The 'svmtrain' function returns a model which can be used for future prediction. It is a structure and is organized as [Parameters, nr_class, totalSV, rho, Label, ProbA, ProbB, nSV, sv_coef, SVs]: -Parameters: parameters -nr_class: number of classes; = 2 for regression/one-class svm -totalSV: total #SV -rho: -b of the decision function(s) wx+b -Label: label of each class; empty for regression/one-class SVM -ProbA: pairwise probability information; empty if -b 0 or in one-class SVM -ProbB: pairwise probability information; empty if -b 0 or in one-class SVM -nSV: number of SVs for each class; empty for regression/one-class SVM -sv_coef: coefficients for SVs in decision functions -SVs: support vectors If you do not use the option '-b 1', ProbA and ProbB are empty matrices. If the '-v' option is specified, cross validation is conducted and the returned model is just a scalar: cross-validation accuracy for classification and mean-squared error for regression. More details about this model can be found in LIBSVM FAQ (http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html) and LIBSVM implementation document (http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf). Result of Prediction ==================== The function 'svmpredict' has three outputs. The first one, predictd_label, is a vector of predicted labels. The second output, accuracy, is a vector including accuracy (for classification), mean squared error, and squared correlation coefficient (for regression). The third is a matrix containing decision values or probability estimates (if '-b 1' is specified). If k is the number of classes, for decision values, each row includes results of predicting k(k-1)/2 binary-class SVMs. For probabilities, each row contains k values indicating the probability that the testing instance is in each class. Note that the order of classes here is the same as 'Label' field in the model structure. Examples ======== Train and test on the provided data heart_scale: matlab> load heart_scale.mat matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07'); matlab> [predict_label, accuracy, dec_values] = svmpredict(heart_scale_label, heart_scale_inst, model); % test the training data For probability estimates, you need '-b 1' for training and testing: matlab> load heart_scale.mat matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07 -b 1'); matlab> load heart_scale.mat matlab> [predict_label, accuracy, prob_estimates] = svmpredict(heart_scale_label, heart_scale_inst, model, '-b 1'); To use precomputed kernel, you must include sample serial number as the first column of the training and testing data (assume your kernel matrix is K, # of instances is n): matlab> K1 = [(1:n)', K]; % include sample serial number as first column matlab> model = svmtrain(label_vector, K1, '-t 4'); matlab> [predict_label, accuracy, dec_values] = svmpredict(label_vector, K1, model); % test the training data We give the following detailed example by splitting heart_scale into 150 training and 120 testing data. Constructing a linear kernel matrix and then using the precomputed kernel gives exactly the same testing error as using the LIBSVM built-in linear kernel. matlab> load heart_scale.mat matlab> matlab> % Split Data matlab> train_data = heart_scale_inst(1:150,:); matlab> train_label = heart_scale_label(1:150,:); matlab> test_data = heart_scale_inst(151:270,:); matlab> test_label = heart_scale_label(151:270,:); matlab> matlab> % Linear Kernel matlab> model_linear = svmtrain(train_label, train_data, '-t 0'); matlab> [predict_label_L, accuracy_L, dec_values_L] = svmpredict(test_label, test_data, model_linear); matlab> matlab> % Precomputed Kernel matlab> model_precomputed = svmtrain(train_label, [(1:150)', train_data*train_data'], '-t 4'); matlab> [predict_label_P, accuracy_P, dec_values_P] = svmpredict(test_label, [(1:120)', test_data*train_data'], model_precomputed); matlab> matlab> accuracy_L % Display the accuracy using linear kernel matlab> accuracy_P % Display the accuracy using precomputed kernel Note that for testing, you can put anything in the testing_label_vector. For more details of precomputed kernels, please read the section ``Precomputed Kernels'' in the README of the LIBSVM package. Other Utilities =============== A matlab function libsvmread reads files in LIBSVM format: [label_vector, instance_matrix] = libsvmread('data.txt'); Two outputs are labels and instances, which can then be used as inputs of svmtrain or svmpredict. A matlab function libsvmwrite writes Matlab matrix to a file in LIBSVM format: libsvmwrite('data.txt', label_vector, instance_matrix] The instance_matrix must be a sparse matrix. (type must be double) These codes are prepared by Rong-En Fan and Kai-Wei Chang from National Taiwan University. Additional Information ====================== This interface was initially written by Jun-Cheng Chen, Kuan-Jen Peng, Chih-Yuan Yang and Chih-Huai Cheng from Department of Computer Science, National Taiwan University. The current version was prepared by Rong-En Fan and Ting-Fan Wu. If you find this tool useful, please cite LIBSVM as follows Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm For any question, please contact Chih-Jen Lin , or check the FAQ page: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q9:_MATLAB_interface
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值