SVM利用网格搜索和交叉验证进行超参选择

本文详细介绍使用Python和scikit-learn库进行SVM(支持向量机)模型的参数调优过程。通过生成两组正态分布数据,分别作为正类和负类,构建数据集并使用GridSearchCV进行交叉验证,寻找最佳参数组合。最终,模型在测试集上展现出优秀的分类性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

import numpy as np 
#产生正态分布的数据100组,中心点(0,0),其标准差σ为1
p=np.random.randn(100,2)
#将中心点移动到(3.5,3.5),作为正类
for i in range(100):
    p[i][0]+=3.5
    p[i][1]+=3.5

#产生正态分布的数据100组,中心点(0,0),其标准差σ为1,作为负类
f=np.random.randn(100,2)

import pandas as pd 

#将np数组转换成dataframe
df_p=pd.DataFrame(p,columns=['x','y'])
#加上标签z,正类标签1
df_p['z']=1

#将np数组转换成dataframe
df_f=pd.DataFrame(f,columns=['x','y'])
#加上标签z,负类标签0
df_f['z']=0

#将正负类合并成一个dataframe
res = pd.concat([df_p, df_f], axis=0)
res.head(10)
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
xyz
04.1983902.3602531
14.8026084.2372651
23.5717164.2461631
33.3894742.7771691
43.4070802.9691041
53.9629964.1180161
62.6794861.4831211
73.5477073.4484521
83.1818104.7442771
93.4127523.5917591
import matplotlib.pyplot as plt

#绘制出数据集的散点图
plt.scatter(res['x'], res['y'], c=res['z'],cmap=plt.cm.Paired)
plt.xlabel('x')
plt.ylabel('y')
plt.title('random data')
plt.show()

在这里插入图片描述

from sklearn.model_selection import train_test_split

#划分测试集
train_x,test_x,train_y,test_y=train_test_split(res[['x','y']],res[['z']],test_size=0.3,random_state=0)
len(train_x)
140
from sklearn.model_selection import GridSearchCV
from sklearn import svm
#构建参数网格
parameters =[{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

clf=GridSearchCV(estimator=svm.SVC(),param_grid=parameters,cv=5,n_jobs=-1,scoring='precision')
clf.fit(train_x,train_y)
e:\Anaconda3\lib\site-packages\sklearn\utils\validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
GridSearchCV(cv=5, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=-1,
       param_grid=[{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring='precision', verbose=0)
clf.best_params_
{'C': 1, 'kernel': 'linear'}
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
print("Grid scores on development set:")
print()
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r"
          % (mean, std * 2, params))
print()
Best parameters set found on development set:

{'C': 1, 'kernel': 'linear'}

Grid scores on development set:

0.987 (+/-0.053) for {'C': 1, 'kernel': 'linear'}
0.987 (+/-0.053) for {'C': 10, 'kernel': 'linear'}
0.987 (+/-0.053) for {'C': 100, 'kernel': 'linear'}
0.987 (+/-0.053) for {'C': 1000, 'kernel': 'linear'}

from sklearn.metrics import classification_report
print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
print(classification_report(test_y, clf.predict(test_x)))
print()
Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        31
          1       1.00      1.00      1.00        29

avg / total       1.00      1.00      1.00        60


评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值