运行SVR模型时出现了pandas.errors.IndexingError的报错,最后的解决方法

文章描述了在使用Pandas处理数据时遇到的IndexingError,原因在于试图对Series进行多维度索引。通过将Series转换为DataFrame解决了问题,因为Series到DataFrame的转换可以确保正确的索引操作。
这个是当时出现的报错
Traceback (most recent call last):
  File "E:\ANACODNA\Lib\site-packages\IPython\core\interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\psq\AppData\Local\Temp\ipykernel_28648\654163054.py", line 85, in <module>
    Ytrain = Ytrain.iloc[:,0].ravel()
             ~~~~~~~~~~~^^^^^
  File "E:\ANACODNA\Lib\site-packages\pandas\core\indexing.py", line 1097, in __getitem__
    # we should be able to match up the dimensionality here
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\pandas\core\indexing.py", line 1594, in _getitem_tuple
    elif isinstance(key, tuple):
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\pandas\core\indexing.py", line 900, in _validate_tuple_indexer
  File "E:\ANACODNA\Lib\site-packages\pandas\core\indexing.py", line 939, in _validate_key_length
    If a tuple key includes an Ellipsis, replace it with an appropriate
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pandas.errors.IndexingError: Too many indexers

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\ANACODNA\Lib\site-packages\IPython\core\interactiveshell.py", line 2120, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\IPython\core\ultratb.py", line 1435, in structured_traceback
    return FormattedTB.structured_traceback(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\IPython\core\ultratb.py", line 1326, in structured_traceback
    return VerboseTB.structured_traceback(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\IPython\core\ultratb.py", line 1173, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\IPython\core\ultratb.py", line 1088, in format_exception_as_a_whole
    frames.append(self.format_record(record))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\IPython\core\ultratb.py", line 970, in format_record
    frame_info.lines, Colors, self.has_colors, lvals
    ^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\IPython\core\ultratb.py", line 792, in lines
    return self._sd.lines
           ^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\stack_data\utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
                                               ^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\stack_data\core.py", line 698, in lines
    pieces = self.included_pieces
             ^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\stack_data\utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
                                               ^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\stack_data\core.py", line 649, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
                             ^^^^^^^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\stack_data\utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
                                               ^^^^^^^^^^^^^^
  File "E:\ANACODNA\Lib\site-packages\stack_data\core.py", line 628, in executing_piece
    return only(
           ^^^^^
  File "E:\ANACODNA\Lib\site-packages\executing\executing.py", line 164, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

可以看出主要是pandas.errors.IndexingError: Too many indexers的问题

这个是我出问题的代码

Ytrain = Ytrain.iloc[:,0].ravel()
Ytest = Ytest.iloc[:,0].ravel()

我用Print(type(Ytrain))发现它的类型是Series(Ytest同样),这里需要将它的类型转变为DataFrame才可以

于是我用下面两行代码解决了这个问题

Ytrain = Ytrain.to_frame()
Ytest = Ytest.to_frame()

其实前面分割数据集时Xtrain和Xtest都是DataFrame,但是Y这边就变成Series了

执行特征选择... 原始特征数: 18 | 选择后特征数: 15 重要特征: ['Sales', 'Quantity', 'Discount', 'Unit_Price', 'Sales_svr_scaled', 'Quantity_svr_scaled', 'Unit_Price_svr_scaled', 'Sales_scaled', 'Quantity_scaled', 'Discount_scaled', 'Unit_Price_scaled', 'Category_freq', 'Sub-Category_freq', 'Region_freq', 'Sales_Discount_Interaction'] 训练 Linear Regression... Linear Regression 训练完成 | R²: 0.6247 | : 0.01s 训练 Ridge Regression... Ridge Regression 训练完成 | R²: 0.6247 | : 0.00s 训练 Lasso... Lasso 训练完成 | R²: 0.6108 | : 0.00s 训练 ElasticNet... ElasticNet 训练完成 | R²: 0.5892 | : 0.00s 训练 SVR (Linear)... SVR (Linear) 训练失败: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128) 训练 SVR (RBF)... SVR (RBF) 训练失败: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128) 训练 SVR (Poly)... SVR (Poly) 训练失败: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128) 训练 Random Forest... Random Forest 训练失败: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128) 训练 XGBoost... XGBoost 训练失败: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128) 训练 LightGBM... LightGBM 训练失败: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128) 模型性能对比: ---------------------------------------------------------------------------------------------------- 模型 R² 调整R² RMSE MAE 训练间 特征数 ---------------------------------------------------------------------------------------------------- Linear Regression 0.6247 0.6219 17.7591 12.0597 0.01s 15 Ridge Regression 0.6247 0.6219 17.7587 12.0575 0.00s 15 Lasso 0.6108 0.6079 18.0855 11.7751 0.00s 15 ElasticNet 0.5892 0.5861 18.5797 12.1793 0.00s 15 ---------------------------------------------------------------------------------------------------- 最佳模型: Ridge Regression (R²: 0.6247) 绘制最佳模型学习曲线... 绘制学习曲线出错: name 'learning_curve' is not defined 分析特征重要性... 特征重要性分析: ------------------------------------------------------------ 该模型不支持特征重要性分析 分析完成! 清洗后的数据已保存为 'Cleaned_Superstore_Final.csv' 进程已结束,退出代码为 0
最新发布
11-27
``` import os import numpy as np import pandas as pd from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVR from sklearn.metrics import mean_squared_error, r2_score def process_single_file(file_path, output_dir, features, target_column): try: # 读取数据 df = pd.read_excel(file_path) df.columns = df.columns.str.strip() # 提取特征和目标 X = df[features] y = df[target_column] # 分割数据集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # 特征归一化 scaler_X = MinMaxScaler() X_train_scaled = scaler_X.fit_transform(X_train) X_test_scaled = scaler_X.transform(X_test) # 目标归一化 scaler_y = MinMaxScaler() y_train_scaled = scaler_y.fit_transform(y.values.reshape(-1, 1)).flatten() y_test_scaled = scaler_y.transform(y_test.values.reshape(-1, 1)).flatten() # 参数网格 param_grid = { 'kernel': ['rbf', 'poly'], 'C': np.logspace(-2, 3, 6), 'gamma': np.logspace(-4, 0, 5), 'epsilon': [0.05, 0.1, 0.2], 'degree': [2, 3] } # 建模与调参 svr = SVR(max_iter=10000, tol=1e-4) grid_search = GridSearchCV(svr, param_grid, cv=5, scoring='neg_mean_squared_error', verbose=0) grid_search.fit(X_train_scaled, y_train_scaled) # 最佳模型预测 best_svr = grid_search.best_estimator_ y_test_pred = best_svr.predict(X_test_scaled) y_test_pred_original = scaler_y.inverse_transform(y_test_pred.reshape(-1, 1)).flatten() # 计算指标 mse = mean_squared_error(y_test, y_test_pred_original) rmse = np.sqrt(mse) r2 = r2_score(y_test, y_test_pred_original) # 创建结果DataFrame file_name = os.path.basename(file_path) results = pd.DataFrame({ '文件名': [file_name], '最佳参数': [str(grid_search.best_params_)], '测试集MSE': [mse], '测试集RMSE': [rmse], '测试集R²': [r2] }) # 保存单独结果 output_path = os.path.join(output_dir, f"{os.path.splitext(file_name)[0]}_结果.xlsx") os.makedirs(os.path.dirname(output_path), exist_ok=True) results.to_excel(output_path, index=False) return results except Exception as e: print(f"处理文件 {file_path} 出错: {str(e)}") return None # 配置参数 INPUT_DIR = "microalgae" # 原始文件存放目录 OUTPUT_DIR = "SVR结果" # 结果输出目录 SUMMARY_FILE = "SVR模型汇总结果.xlsx" # 汇总文件名 FEATURES = ['T', 'Ph', 'Biomass', 'Time', 'Initial'] # 特征列 TARGET = 'Removel' # 目标列 # 获取所有Excel文件 all_files = [os.path.join(INPUT_DIR, f) for f in os.listdir(INPUT_DIR) if f.endswith('.xlsx') or f.endswith('.xls')] # 处理所有文件并汇总结果 all_results = [] for file in all_files: print(f"正在处理文件: {file}") result = process_single_file(file, OUTPUT_DIR, FEATURES, TARGET) if result is not None: all_results.append(result) # 合并结果并保存 if all_results: summary_df = pd.concat(all_results, ignore_index=True) summary_df.to_excel(os.path.join(OUTPUT_DIR, SUMMARY_FILE), index=False) print(f"处理完成!共处理 {len(all_results)} 个文件,汇总结果已保存。") else: print("没有成功处理任何文件。")```代码出现Found input variables with inconsistent numbers of samples
03-08
import pandas as pd import numpy as np from sklearn.preprocessing import StandardScaler from diffprivlib.mechanisms import Laplace from sklearn.model_selection import train_test_split, cross_val_score from sklearn.svm import SVR import joblib # 加载数据 df_a = pd.read_csv('A_cars.csv') df_b = pd.read_csv('B_cars.csv') df = pd.concat([df_a, df_b], ignore_index=True) # 数据清洗 # 1. 清洗 mileage 列 df['milage'] = df['milage'].str.replace(',', '').str.extract(r'(\d+)').astype(float) # 2. 生成 age 属性(防止除0错误) current_year = 2023 df['age'] = current_year - df['model_year'] df.loc[df['age'] == 0, 'age'] = 1 # 将age=0替换为1,避免除0错误 # 3. 生成 mileage_per_year 属性(处理无穷大) df['mileage_per_year'] = df['milage'] / df['age'] df.replace([np.inf, -np.inf], np.nan, inplace=True) # 替换无穷大为NaN # 4. 生成其他属性 df['is_accident_free'] = df['accident'].apply(lambda x: 'None reported' in str(x)) df['is_clean_title'] = df['clean_title'].apply(lambda x: bool(x)) df['engine_power'] = df['engine'].str.extract(r'(\d+\.?\d*)\s*HP').astype(float) df.drop(columns=['accident', 'clean_title'], inplace=True) # 5. 处理缺失值(改进的鲁棒填充) def robust_fillna(series): # 移除异常值(使用IQR方法) q1 = series.quantile(0.25) q3 = series.quantile(0.75) iqr = q3 - q1 lower_bound = q1 - 1.5 * iqr upper_bound = q3 + 1.5 * iqr # 创建过滤条件 valid_mask = (series >= lower_bound) & (series <= upper_bound) # 确保有有效值可用 if valid_mask.any(): return series.fillna(series[valid_mask].mean()) return series.fillna(0) # 没有有效值填充0 numeric_columns = ['milage', 'age', 'mileage_per_year', 'engine_power'] for col in numeric_columns: df[col] = robust_fillna(df[col]) # 6. 应用差分隐私(正确实现) epsilon = 1.0 # 隐私预算 for col in numeric_columns: # 计算敏感度(使用全局范围) sensitivity = df[col].max() - df[col].min() laplace = Laplace(epsilon=epsilon, sensitivity=sensitivity) # 为每个值添加噪声 df[col] = df[col].apply(lambda x: laplace.randomise(x)) # 7. 对分类变量进行独热编码 categorical_columns = ['brand', 'model'] df = pd.get_dummies(df, columns=categorical_columns) # 8. 标准化数值特征(添加检查) # 确保没有NaN或无穷大 assert not df[numeric_columns].isnull().values.any(), "存在NaN值" assert not np.isinf(df[numeric_columns].values).any(), "存在无穷大值" scaler = StandardScaler() df[numeric_columns] = scaler.fit_transform(df[numeric_columns]) # 9. 模型训练与评估 X = df.drop(columns=['price']) y = df['price'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) model = SVR(kernel='rbf', C=1.0, gamma='scale') model.fit(X_train, y_train) # 交叉验证 scores = cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mean_squared_error') print(f"Cross-validation MSE scores: {-scores}") # 保存结果 joblib.dump(model, 'car_price_model.pkl') df.to_csv('cleaned_cars_private.csv', index=False) 运行后: Traceback (most recent call last): File "D:\peixunruanjian0106\python\pythonprojects\cd03\实训开始\二手车(第二周)\data_cleaning.py", line 84, in <module> model.fit(X_train, y_train) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\sklearn\base.py", line 1363, in wrapper return fit_method(estimator, *args, **kwargs) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\sklearn\svm\_base.py", line 197, in fit X, y = validate_data( File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\sklearn\utils\validation.py", line 2971, in validate_data X, y = check_X_y(X, y, **check_params) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\sklearn\utils\validation.py", line 1368, in check_X_y X = check_array( File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\sklearn\utils\validation.py", line 971, in check_array array = array.astype(new_dtype) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\pandas\core\generic.py", line 6643, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\pandas\core\internals\managers.py", line 430, in astype return self.apply( File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\pandas\core\internals\managers.py", line 363, in apply applied = getattr(b, f)(**kwargs) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\pandas\core\internals\blocks.py", line 758, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\pandas\core\dtypes\astype.py", line 237, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\pandas\core\dtypes\astype.py", line 182, in astype_array values = _astype_nansafe(values, dtype, copy=copy) File "D:\peixunruanjian0106\python\python3.10.2\lib\site-packages\pandas\core\dtypes\astype.py", line 133, in _astype_nansafe return arr.astype(dtype, copy=True) ValueError: could not convert string to float: 'Gasoline'
07-15
# -*- coding: utf-8 -*- """ Created on Sun Jul 27 09:56:06 2025 @author: Sonia """ import os import time import logging from typing import Tuple import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.svm import SVR from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error, explained_variance_score import joblib # 配置日志 logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') # 指定支持中文的字体,SimSun 是 Windows 系统自带的宋体,Linux/Mac 可换其他字体 plt.rcParams['font.sans-serif'] = ['SimSun'] # 解决负号显示为方块的问题 plt.rcParams['axes.unicode_minus'] = False # 配置参数 CONFIG = { 'working_dir': r'E:\论文代码\data', 'data_file': './data1.xlsx', 'model_save_dir': '../data_result_model/SVR/', 'result_save_dir': '../data_result/SVR/', 'plot_save_dir': '../data_result/SVR/', # 图片保存路径 'test_size': 0.3, # 测试集占比 # SVR默认超参数 'default_params': { 'C': 50, 'gamma': 0.01, 'kernel': 'rbf', 'epsilon': 0.1, 'shrinking': True, 'tol': 0.001, 'max_iter': -1 } } def set_working_directory(): """设置工作目录并创建所有必要目录""" os.chdir(CONFIG['working_dir']) pd.set_option('display.max_columns', 32) # 确保所有保存目录存在 os.makedirs(CONFIG['plot_save_dir'], exist_ok=True) os.makedirs(CONFIG['model_save_dir'], exist_ok=True) os.makedirs(CONFIG['result_save_dir'], exist_ok=True) logging.info("工作目录及保存目录设置完成") def process_data(path: str) -> pd.DataFrame: """处理数据:读取、去重、缺失值处理""" original_columns = ['x', 'y', 'rel.wind_speed', 'wind_direction', 'speed', 'course', 'swh', 'me_oil_consume'] # 特征列与目标列 data = pd.read_excel(path) # 处理缺失值 nan_rows = data[data.isnull().any(axis=1)] if not nan_rows.empty: logging.warning(f"发现 {len(nan_rows)} 行含NaN数据,已剔除") data = data.dropna() # 确保数据按间顺序排列(取消注释即可启用) """if '间' in data.columns: data['间'] = pd.to_datetime(data['间']) data = data.sort_values('间').reset_index(drop=True) else: logging.warning("数据中未发现'间'列,假设数据已按间顺序排列")""" # 确保只保留需要的列并命名 data = data[original_columns] data.columns = [f'F{i}' for i in range(1, 9)] # 重命名为F1到F8 return data def load_data() -> Tuple[np.ndarray, np.ndarray, pd.DataFrame]: """加载数据,返回特征、目标变量和原始数据""" train_data = process_data(CONFIG['data_file']) x = train_data.iloc[:, 0:7].values # 前7列为特征 y = train_data.iloc[:, -1].values # 最后一列为目标变量(油耗) print(x, y.shape) # 验证数据有效性 if x.size == 0 or y.size == 0: raise ValueError("加载的数据为空,请检查数据源") return x, y, train_data def train_and_evaluate(x: np.ndarray, y: np.ndarray, train_data: pd.DataFrame) -> Tuple[ list, list, np.ndarray, np.ndarray, np.ndarray, np.ndarray]: """训练SVR模型并评估,返回预测值和真实值用于绘图""" results = [] param_counts = [] n_features = x.shape[1] # 序划分(固定训练/测试集划分) train_size = int(len(x) * (1 - CONFIG['test_size'])) x_train, x_test = x[:train_size], x[train_size:] y_train, y_test = y[:train_size], y[train_size:] # 数据标准化(仅拟合训练集) std_x = StandardScaler() std_y = StandardScaler() x_train = std_x.fit_transform(x_train) x_test = std_x.transform(x_test) y_train_scaled = std_y.fit_transform(y_train.reshape(-1, 1)) y_test_scaled = std_y.transform(y_test.reshape(-1, 1)) # 训练默认参数的SVR模型 model = SVR( C=CONFIG['default_params']['C'], gamma=CONFIG['default_params']['gamma'], kernel=CONFIG['default_params']['kernel'], epsilon=CONFIG['default_params']['epsilon'], shrinking=CONFIG['default_params']['shrinking'], tol=CONFIG['default_params']['tol'], max_iter=CONFIG['default_params']['max_iter'] ) start_time = time.perf_counter() model.fit(x_train, y_train_scaled.ravel()) end_time = time.perf_counter() logging.info(f"模型训练完成,耗: {end_time - start_time:.4f}秒") # 预测 y_train_predict_scaled = model.predict(x_train) y_test_predict_scaled = model.predict(x_test) # 反标准化,转换回原始尺度 y_train_predict = std_y.inverse_transform(y_train_predict_scaled.reshape(-1, 1)) y_test_predict = std_y.inverse_transform(y_test_predict_scaled.reshape(-1, 1)) # 计算评估指标 result = calculate_metrics( y_train_scaled, y_test_scaled, y_train_predict, y_test_predict, std_y, end_time - start_time, n_features ) result['Model'] = 'SVR_Default' # 标记模型 results.append(result) param_counts.append(len(model.support_)) # 记录支持向量数量 # 保存模型和标准化器 model_path = f"{CONFIG['model_save_dir']}_SVR_default_data1.joblib" joblib.dump(model, model_path) joblib.dump(std_x, f"{CONFIG['model_save_dir']}_std_x_data1.joblib") joblib.dump(std_y, f"{CONFIG['model_save_dir']}_std_y_data1.joblib") logging.info(f"模型已保存至: {model_path}") return results, param_counts, y_train, y_train_predict, y_test, y_test_predict def calculate_metrics(y_train, y_test, y_train_predict, y_test_predict, std_y, run_time, n_features): """计算评估指标""" y_train = std_y.inverse_transform(y_train) y_test = std_y.inverse_transform(y_test) # 处理可能的维度问题 y_train = y_train.flatten() y_test = y_test.flatten() y_train_predict = y_train_predict.flatten() y_test_predict = y_test_predict.flatten() # 基础指标计算 train_r2 = r2_score(y_train, y_train_predict) test_r2 = r2_score(y_test, y_test_predict) train_mse = mean_squared_error(y_train, y_train_predict) test_mse = mean_squared_error(y_test, y_test_predict) train_mae = mean_absolute_error(y_train, y_train_predict) test_mae = mean_absolute_error(y_test, y_test_predict) # 调整后R²(避免样本量或特征数影响) n_train = len(y_train) n_test = len(y_test) train_r2_adj = 1 - (1 - train_r2) * (n_train - 1) / (n_train - n_features - 1) if n_train > n_features + 1 else 0 test_r2_adj = 1 - (1 - test_r2) * (n_test - 1) / (n_test - n_features - 1) if n_test > n_features + 1 else 0 # 其他辅助指标 train_evs = explained_variance_score(y_train, y_train_predict) test_evs = explained_variance_score(y_test, y_test_predict) train_mape = np.mean(np.abs((y_train_predict - y_train) / (y_train + 1e-8))) * 100 # 避免除零 test_mape = np.mean(np.abs((y_test_predict - y_test) / (y_test + 1e-8))) * 100 return { 'Train R2': train_r2, 'Test R2': test_r2, 'Train R2 Adj': train_r2_adj, 'Test R2 Adj': test_r2_adj, 'Train MSE': train_mse, 'Test MSE': test_mse, 'Train RMSE': np.sqrt(train_mse), 'Test RMSE': np.sqrt(test_mse), 'Train MAE': train_mae, 'Test MAE': test_mae, 'Train EVS': train_evs, 'Test EVS': test_evs, 'Train MAPE': train_mape, 'Test MAPE': test_mape, 'Run Time': run_time } def save_results(results: list, param_counts: np.ndarray, feature_names: list): """保存结果到Excel""" results_df = pd.DataFrame(results) # 调整列顺序,将Model放在首位 cols = ['Model'] + [col for col in results_df.columns if col != 'Model'] results_df = results_df[cols] # 添加支持向量数量列 param_counts_df = pd.DataFrame(param_counts, columns=['Support_Vectors_Count']) results_df = pd.concat([results_df, param_counts_df], axis=1) # 保存结果 filename = os.path.splitext(os.path.basename(CONFIG['data_file']))[0] save_path = f"{CONFIG['result_save_dir']}{filename}_SVR.xlsx" results_df.to_excel(save_path, index=False, engine='openpyxl') logging.info(f'结果已保存至: {save_path}') def plot_predictions_vs_true(y_train, y_train_pred, y_test, y_test_pred): """绘制预测值与真实值对比图(包括训练集和测试集)""" # 确保数据为一维且过滤异常值 print(y_train.shape, y_train_pred.shape, y_test.shape, y_test_pred.shape) y_train = np.ravel(y_train) y_train_pred = np.ravel(y_train_pred) y_test = np.ravel(y_test) y_test_pred = np.ravel(y_test_pred) print(y_train.shape, y_train_pred.shape, y_test.shape, y_test_pred.shape) # 过滤非有限值 mask_train = np.isfinite(y_train) & np.isfinite(y_train_pred) y_train = y_train[mask_train] y_train_pred = y_train_pred[mask_train] mask_test = np.isfinite(y_test) & np.isfinite(y_test_pred) y_test = y_test[mask_test] y_test_pred = y_test_pred[mask_test] print(y_train.shape, y_train_pred.shape, y_test.shape, y_test_pred.shape) plt.figure(figsize=(12, 6)) # 绘制训练集对比 plt.subplot(1, 2, 1) plt.scatter(y_train, y_train_pred, alpha=0.6, color='blue', label='训练集') # 添加理想线 (y = x) min_val = np.min(np.concatenate([y_train, y_train_pred])) max_val = np.max(np.concatenate([y_train, y_train_pred])) plt.plot([min_val, max_val], [min_val, max_val], 'r--', label='理想线 y=x') plt.xlabel('真实值') plt.ylabel('预测值') plt.title('训练集:预测值 vs 真实值') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) # 绘制测试集对比 plt.subplot(1, 2, 2) plt.scatter(y_test, y_test_pred, alpha=0.6, color='green', label='测试集') # 添加理想线 (y = x) min_val = np.min(np.concatenate([y_test, y_test_pred])) max_val = np.max(np.concatenate([y_test, y_test_pred])) plt.plot([min_val, max_val], [min_val, max_val], 'r--', label='理想线 y=x') plt.xlabel('真实值') plt.ylabel('预测值') plt.title('测试集:预测值 vs 真实值') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.tight_layout() # 保存图片 save_path = f"{CONFIG['plot_save_dir']}predictions_vs_true.png" plt.savefig(save_path, dpi=300, bbox_inches='tight') plt.close() logging.info(f'预测值与真实值对比图已保存至: {save_path}') def plot_error_distribution(y_train, y_train_predict, y_test, y_test_predict): """绘制误差分布直方图(包括训练集和测试集)""" # 计算误差 train_errors = np.ravel(y_train_predict) - np.ravel(y_train) test_errors = np.ravel(y_test_predict) - np.ravel(y_test) plt.figure(figsize=(12, 6)) # 训练集误差分布 plt.subplot(1, 2, 1) plt.hist(train_errors, bins=30, alpha=0.7, color='blue', edgecolor='black') plt.axvline(x=0, color='red', linestyle='--', label='零误差') plt.xlabel('预测误差 (预测值 - 真实值)') plt.ylabel('频数') plt.title(f'训练集误差分布 (均值: {train_errors.mean():.4f}, 标准差: {train_errors.std():.4f})') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) # 测试集误差分布 plt.subplot(1, 2, 2) plt.hist(test_errors, bins=30, alpha=0.7, color='green', edgecolor='black') plt.axvline(x=0, color='red', linestyle='--', label='零误差') plt.xlabel('预测误差 (预测值 - 真实值)') plt.ylabel('频数') plt.title(f'测试集误差分布 (均值: {test_errors.mean():.4f}, 标准差: {test_errors.std():.4f})') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.tight_layout() # 保存图片 save_path = f"{CONFIG['plot_save_dir']}error_distribution.png" plt.savefig(save_path, dpi=300, bbox_inches='tight') plt.close() logging.info(f'误差分布直方图已保存至: {save_path}') def plot_fitting_curve(y_train, y_train_predict, y_test, y_test_predict): """绘制拟合曲线(按数据顺序展示)""" # 创建索引用于排序展示 train_idx = np.arange(len(y_train)) test_idx = np.arange(len(y_train), len(y_train) + len(y_test)) plt.figure(figsize=(12, 6)) # 绘制训练集拟合曲线 plt.plot(train_idx, y_train, 'b-', label='训练集真实值', alpha=0.7) plt.plot(train_idx, y_train_predict, 'r--', label='训练集预测值', alpha=0.7) # 绘制测试集拟合曲线 plt.plot(test_idx, y_test, 'g-', label='测试集真实值', alpha=0.7) plt.plot(test_idx, y_test_predict, 'm--', label='测试集预测值', alpha=0.7) # 添加训练集和测试集分隔线 plt.axvline(x=len(y_train) - 1, color='gray', linestyle=':', label='训练集/测试集分割') plt.xlabel('样本索引') plt.ylabel('油耗值') plt.title('拟合曲线:预测值与真实值随样本顺序变化') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.tight_layout() # 保存图片 save_path = f"{CONFIG['plot_save_dir']}fitting_curve.png" plt.savefig(save_path, dpi=300, bbox_inches='tight') def main(): """主函数:串联数据加载、模型训练、结果保存和绘图流程""" # 1. 设置工作目录 set_working_directory() # 2. 加载数据 x, y, train_data = load_data() logging.info(f"数据加载完成,特征维度:{x.shape},目标变量数量:{y.shape[0]}") # 3. 训练模型并获取评估结果、预测值 results, param_counts, y_train, y_train_predict, y_test, y_test_predict = train_and_evaluate(x, y, train_data) # 4. 保存评估结果到Excel feature_names = [f'F{i}' for i in range(1, 1 + x.shape[1])] # 匹配特征命名规则 save_results(results, np.array(param_counts), feature_names) # 5. 绘制并保存所有图表 plot_predictions_vs_true(y_train, y_train_predict, y_test, y_test_predict) plot_error_distribution(y_train, y_train_predict, y_test, y_test_predict) plot_fitting_curve(y_train, y_train_predict, y_test, y_test_predict) logging.info("所有流程执行完毕!") if __name__ == "__main__": main()用粒子群算法来优化这段SVR代码的超参数
08-15
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值