【空间代谢】深入解析—使用 scikit-learn 库中的 SVM 模型进行回归任务

本文链接：https://blog.youkuaiyun.com/linlinzhengkarry/article/details/145671050

在这里插入图片描述

在 scikit - learn 库中，可以使用 SVR（Support Vector
Regression，支持向量回归）类来实现支持向量机的回归任务。下面将详细介绍使用 scikit - learn 中的 SVM
模型进行回归任务的步骤，并给出具体的代码示例。

步骤 1：导入必要的库

首先，需要导入 scikit - learn 库以及其他可能用到的辅助库，如 numpy 和 matplotlib。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

步骤 2：准备数据集

为了演示 SVM 回归，我们可以使用一个简单的人工数据集。当然，在实际应用中，你可以使用自己的真实数据集。

# 生成一些示例数据
np.random.seed(42)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.randn(80) * 0.1

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

步骤 3：创建并训练 SVM 回归模型

使用 SVR 类创建 SVM 回归模型，并使用训练数据对其进行训练。SVR 类有几个重要的参数需要设置，包括核函数（kernel）、惩罚系数（C）和核系数（gamma）等。

# 创建 SVM 回归模型
# 使用径向基核函数（RBF），惩罚系数 C 设为 100，核系数 gamma 设为 0.1
svr = SVR(kernel='rbf', C=100, gamma=0.1)

# 训练模型
svr.fit(X_train, y_train)

步骤 4：进行预测

使用训练好的模型对测试集进行预测。

# 对测试集进行预测
y_pred = svr.predict(X_test)

步骤 5：评估模型性能

使用一些评估指标来评估模型的性能，常见的评估指标包括均方误差（Mean Squared Error, MSE）和决定系数（Coefficient of determination, $R^2$ ）。

# 计算均方误差
mse = mean_squared_error(y_test, y_pred)
print(f"均方误差: {mse}")

# 计算决定系数
r2 = r2_score(y_test, y_pred)
print(f"决定系数: {r2}")

步骤 6：可视化结果（可选）

为了更直观地观察模型的预测效果，可以使用 matplotlib 库将训练数据、真实值和预测值进行可视化。

# 生成用于绘制曲线的点
X_plot = np.linspace(0, 5, 100).reshape(-1, 1)
y_plot = svr.predict(X_plot)

# 绘制训练数据
plt.scatter(X_train, y_train, color='darkorange', label='Training data')
# 绘制测试数据的真实值和预测值
plt.scatter(X_test, y_test, color='navy', label='True values')
plt.scatter(X_test, y_pred, color='c', label='Predicted values')
# 绘制预测曲线
plt.plot(X_plot, y_plot, color='cornflowerblue', linewidth=2, label='Regression curve')
plt.xlabel('Data')
plt.ylabel('Target')
plt.title('SVM Regression')
plt.legend()
plt.show()

完整代码示例

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 生成一些示例数据
np.random.seed(42)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.randn(80) * 0.1

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 创建 SVM 回归模型
svr = SVR(kernel='rbf', C=100, gamma=0.1)

# 训练模型
svr.fit(X_train, y_train)

# 对测试集进行预测
y_pred = svr.predict(X_test)

# 计算均方误差
mse = mean_squared_error(y_test, y_pred)
print(f"均方误差: {mse}")

# 计算决定系数
r2 = r2_score(y_test, y_pred)
print(f"决定系数: {r2}")

# 生成用于绘制曲线的点
X_plot = np.linspace(0, 5, 100).reshape(-1, 1)
y_plot = svr.predict(X_plot)

# 绘制训练数据
plt.scatter(X_train, y_train, color='darkorange', label='Training data')
# 绘制测试数据的真实值和预测值
plt.scatter(X_test, y_test, color='navy', label='True values')
plt.scatter(X_test, y_pred, color='c', label='Predicted values')
# 绘制预测曲线
plt.plot(X_plot, y_plot, color='cornflowerblue', linewidth=2, label='Regression curve')
plt.xlabel('Data')
plt.ylabel('Target')
plt.title('SVM Regression')
plt.legend()
plt.show()

注意事项

参数调整：SVR 模型的性能很大程度上取决于参数的选择，如核函数、C 和 gamma 等。可以使用网格搜索（GridSearchCV）或随机搜索（RandomizedSearchCV）等方法来寻找最优的参数组合。
数据预处理：在实际应用中，对数据进行标准化处理（如使用 StandardScaler）可以提高模型的训练效果和稳定性。
核函数选择：不同的核函数适用于不同类型的数据。线性核函数适用于线性关系的数据，而径向基核函数（RBF）则适用于非线性关系的数据。