吴恩达机器学习-编程练习-ex6.2

最新推荐文章于 2025-03-12 17:48:57 发布

onesmile5137

最新推荐文章于 2025-03-12 17:48:57 发布

阅读量550

点赞数 1

本文链接：https://blog.youkuaiyun.com/onesmile5137/article/details/97651482

版权

这篇博客介绍了使用SVM和Gaussian Kernel进行非线性拟合的练习过程，包括数据加载、建模、参数优化。通过标准化数据、选择RBF核函数，并用交叉验证和网格搜索找到最佳参数(C, γ)。虽然无法直接画决策边界，但通过全量网格预测展示了结果。同时，博主提出基础SVM算法得到的拟合效果不佳。" 113319502,10294092,Spring Security：正确放行登录接口的方法,"['Spring Security', '安全过滤', '登录认证']

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本节练习是为了应用SVM进行非线性拟合，使用Gaussian Kernel

首先load数据集并plot

#----------------------------part4---------------------------#
#读取数据1，并将数据整理成可识别的格式
path = 'C:\\Users\Huanuo\PycharmProjects\ml\ex6_svm\ex6\ex6data2.mat'
m = loadmat(path)
df1 = pd.DataFrame(m['X'])
df2 = pd.DataFrame(m['y'])
df3 = pd.concat([df1,df2],axis=1)
df3.columns = [1,2,3]
#将数据可视化
n = 1024
X1 = df3.loc[df3[3]==1,1]
Y1 = df3.loc[df3[3]==1,2]
scatter(X1,Y1,marker = '*',color = 'r')
X2 = df3.loc[df3[3]==0,1]
Y2 = df3.loc[df3[3]==0,2]
scatter(X2,Y2,marker = '+',color = 'y')
show()

在这里插入图片描述
然后应用SVM进行建模，这中间踩了很多坑，大坑，首先贴一下成功的代码：

import numpy as np
from sklearn.svm import SVC
from scipy.io import loadmat
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
def load_data():
    path = 'C:\\Users\Huanuo\PycharmProjects\ml\ex6_svm\ex6\ex6data2.mat'
    m = loadmat(path)
    x, y = m['X'], m['y']
    y=np.ravel(y)
    scaler = StandardScaler()
    x_std = scaler.fit_transform(x)  # 标准化
    x_train, y_train=x_std,y
    return x_train, y_train

def svm_c(x_train ,y_train):
    # rbf核函数，设置数据权重
    svc = SVC(kernel='rbf', class_weight='balanced',)
    c_range = np.logspace(-5, 15, 11, base=2)
    gamma_range = np.logspace(-9, 3, 13, base=2)
    # 网格搜索交叉验证的参数范围，cv=3,3折交叉
    param_grid = [{'kernel': ['rbf'], 'C': c_range, 'gamma': gamma_range}]
    grid = GridSearchCV(svc, param_grid, cv=3, n_jobs=-1)
    print()
    # 训练模型
    clf = grid.fit(x_train, y_train)
    # 计算测试集精度
    # score = grid.score(x_test, y_test)
    plotsvm(x_train,clf,y_train)
    # print('精度为%s' % score)

def plotsvm(x_train,clf,y_train):
    # 可视化处理
    # step size in the mesh
    h = .02
    # create a mesh to plot in
    x_min, x_max = x_train[:, 0].min()-0.02, x_train[:, 0].max()+0.02
    y_min, y_max = x_train[:, 1].min()-0.02, x_train[:, 1].max()+0.02
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
    plt.axis('off')
    # Plot also the training points画点
    color_map = {0: (0, 0, .9), 1: (1, 0, 0)}
    colors = [color_map[y] for y in y_train]
    plt.scatter(x_train[:, 0], x_train[:, 1], c=colors, edgecolors='black')
    plt.show()
if __name__ == '__main__':
    svm_c(*load_data())

在这里插入图片描述
这段代码毫无疑问是成功的，最重要的是它引入了建模的一个重要思维流程：

1.将原始数据转化为SVM算法软件或包所能识别的数据格式；
2.将数据标准化；(防止样本中不同特征数值大小相差较大影响分类器性能)
3.不知使用什么核函数，考虑使用RBF；
4.利用交叉验证网格搜索寻找最优参数(C, γ)；（交叉验证防止过拟合，网格搜索在指定范围内寻找最优参数）
5.使用最优参数来训练模型；
6.测试

也就是说，利用网格搜索的形式进行调参，这类似于一个贪心算法。

其次，还有一点需要记录，就是再plot决策边界的时候，目前还不知道如何直接画出边界，所以只能通过全量网格预测覆盖的形式将预测结果画出来。

另外，现在依旧没有解决，为何使用基础SVM算法（非网格搜索调参）的方式，只能得到一个非常恶劣的拟合结果。如下图所示。
在这里插入图片描述