你提出的问题是一个非常完整且复杂的**多模型集成+优化+残差修正**的预测建模流程,涉及:
- 数据读取与归一化
- 多种机器学习模型(LSTM、Kriging、SVR)
- 模型评估与保存
- 集成模型构建 + 权重优化(PSO)
- 残差建模 + GA-BP 神经网络
- 最终模型合成与逆归一化
- 预测输出
我们将 **分步骤实现整个流程**,使用 Python 的主流库完成。
---
## ✅ 所需依赖库安装(建议使用 pip 或 conda)
```bash
pip install pandas openpyxl scikit-learn tensorflow numpy scipy deap pyswarm joblib
```
---
## ✅ 完整代码实现如下
```python
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
from scipy.interpolate import Rbf # 克里金近似使用径向基函数
import tensorflow as tf
from tensorflow.keras.models import Sequential, save_model, load_model
from tensorflow.keras.layers import LSTM, Dense
import pickle
import pyswarm as ps # PSO优化库
from deap import base, creator, tools, algorithms # GA for BPNN
import os
from tqdm import tqdm
# ----------------------------
# Step 1: 读取Excel数据
# ----------------------------
file_path = 'data.xlsx' # 请确保文件名为 data.xlsx 并包含 A, B, C, D, E, F, G 列
df = pd.read_excel(file_path)
input_cols = ['A', 'B', 'C', 'D']
output_cols = ['E', 'F', 'G']
X = df[input_cols].values # 输入 (N, 4)
Y = df[output_cols].values # 输出 (N, 3)
# ----------------------------
# Step 2: 归一化处理
# ----------------------------
scaler_X = MinMaxScaler()
scaler_Y = MinMaxScaler()
X_scaled = scaler_X.fit_transform(X) # (N, 4)
Y_scaled = scaler_Y.fit_transform(Y) # (N, 3)
# 划分训练集(全部用于训练,无测试划分)
X_train, Y_train = X_scaled, Y_scaled
# Reshape for LSTM: (samples, timesteps, features)
X_train_lstm = X_train.reshape((X_train.shape[0], 1, X_train.shape[1])) # (N, 1, 4)
# 创建保存目录
os.makedirs('models', exist_ok=True)
os.makedirs('results', exist_ok=True)
# 存储所有单个模型
models = {}
predictions = {}
# ----------------------------
# Step 3: 构建9个独立预测模型 (LSTM, Kriging, SVR) × 3输出
# ----------------------------
print("开始训练9个基础模型...")
for i, target in enumerate(output_cols):
y_train = Y_train[:, i] # 当前目标列 E/F/G
# --- LSTM 模型 ---
print(f"训练 LSTM.{target}.model")
model_lstm = Sequential([
LSTM(32, input_shape=(1, 4), return_sequences=False),
Dense(16, activation='relu'),
Dense(1, activation='linear')
])
model_lstm.compile(optimizer='adam', loss='mse')
history = model_lstm.fit(X_train_lstm, y_train, epochs=50, batch_size=16, verbose=0)
# 保存模型结构和权重
save_model(model_lstm, f'models/LSTM.{target}.model.h5')
models[f'LSTM.{target}'] = model_lstm
# 预测
pred_lstm = model_lstm.predict(X_train_lstm, verbose=0).flatten()
predictions[f'LSTM.{target}'] = pred_lstm
# --- Kriging 模型 (使用 RBF 近似克里金插值) ---
print(f"训练 Kriging.{target}.model")
try:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF as GPR_RBF, ConstantKernel as C
kernel = C(1.0, (1e-4, 1e4)) * GPR_RBF(1.0, (1e-4, 1e4))
model_kriging = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=1e-2)
model_kriging.fit(X_train, y_train)
models[f'Kriging.{target}'] = model_kriging
pred_kriging = model_kriging.predict(X_train)
predictions[f'Kriging.{target}'] = pred_kriging
except Exception as e:
print(f"Kriging.{target} 训练失败: {e}")
pred_kriging = np.zeros_like(y_train)
predictions[f'Kriging.{target}'] = pred_kriging
# --- SVR 模型 ---
print(f"训练 SVR.{target}.model")
model_svr = SVR(kernel='rbf', C=1.0, gamma='scale')
model_svr.fit(X_train, y_train)
models[f'SVR.{target}'] = model_svr
pred_svr = model_svr.predict(X_train)
predictions[f'SVR.{target}'] = pred_svr
# --- 保存每个模型为 .pkl ---
with open(f'models/LSTM.{target}.model.pkl', 'wb') as f:
pickle.dump(model_lstm, f)
with open(f'models/Kriging.{target}.model.pkl', 'wb') as f:
pickle.dump(model_kriging, f)
with open(f'models/SVR.{target}.model.pkl', 'wb') as f:
pickle.dump(model_svr, f)
# --- 模型评估 ---
results = []
for name in [f'LSTM.{target}', f'Kriging.{target}', f'SVR.{target}']:
y_pred = predictions[name]
mse = mean_squared_error(y_train, y_pred)
r2 = r2_score(y_train, y_pred)
results.append({
'Model': name,
'MSE': mse,
'R²': r2
})
# 保存评估结果
result_df = pd.DataFrame(results)
result_df.to_csv(f'results/{target}_evaluation.csv', index=False)
print(result_df)
# ----------------------------
# Step 4: 对9个模型进行逆归一化后保存(仅预测值可逆归一化)
# ----------------------------
# 注意:模型本身不需要“逆归一化”,但我们可以保存一个包装器或说明
# 我们这里先保存原始归一化模型,后续在集成时统一处理
print("已保存9个基础模型及其评估结果。")
# ----------------------------
# Step 5: 构建集成模型 Ensemble.E/F/G.model 并用 PSO 优化权重 W.X1, W.X2, W.X3
# ----------------------------
ensemble_weights = {}
def create_ensemble_prediction(weights, preds):
w1, w2, w3 = weights
return w1 * preds[0] + w2 * preds[1] + w3 * preds[2]
def objective_pso(weights, y_true, preds):
pred = create_ensemble_prediction(weights, preds)
r2 = r2_score(y_true, pred)
return -r2 # 最小化负 R² → 最大化 R²
bounds = ([0, 0, 0], [1, 1, 1]) # 权重范围 [0,1]
print("开始使用PSO优化集成权重...")
for target in output_cols:
y_true = Y_train[:, output_cols.index(target)]
preds = [
predictions[f'LSTM.{target}'],
predictions[f'Kriging.{target}'],
predictions[f'SVR.{target}']
]
def obj_wrapper(weights):
if abs(sum(weights) - 1.0) > 1e-3:
penalty = 1e6 * abs(sum(weights) - 1.0)
return -r2_score(y_true, create_ensemble_prediction(weights, preds)) + penalty
return objective_pso(weights, y_true, preds)
print(f"优化 Ensemble.{target}.model 的权重...")
best_weights, _ = ps.pso(obj_wrapper, lb=bounds[0], ub=bounds[1], swarmsize=30, maxiter=100)
best_weights = best_weights / np.sum(best_weights) # 强制归一化到和为1
ensemble_weights[f'Ensemble.{target}'] = best_weights
# 构建最终加权预测
final_pred = create_ensemble_prediction(best_weights, preds)
r2_final = r2_score(y_true, final_pred)
print(f"Ensemble.{target}.model | 权重: {best_weights}, R²: {r2_final:.4f}")
# 保存集成模型(权重 + 成员)
ensemble_model = {
'members': [f'LSTM.{target}', f'Kriging.{target}', f'SVR.{target}'],
'weights': best_weights,
'scaler_Y_col': scaler_Y, # 用于逆归一化
'target_index': output_cols.index(target)
}
with open(f'models/Ensemble.{target}.model.pkl', 'wb') as f:
pickle.dump(ensemble_model, f)
# ----------------------------
# Step 6: 构建残差项 Res.E, Res.F, Res.G
# ----------------------------
residuals = {}
for target in output_cols:
y_true = Y_train[:, output_cols.index(target)]
preds = [
predictions[f'LSTM.{target}'],
predictions[f'Kriging.{target}'],
predictions[f'SVR.{target}']
]
weights = ensemble_weights[f'Ensemble.{target}']
ensemble_pred = create_ensemble_prediction(weights, preds)
res = y_true - ensemble_pred
residuals[f'Res.{target}'] = res
# 保存残差
np.save(f'results/Res.{target}.npy', res)
print("残差项计算完成。")
# ----------------------------
# Step 7: 使用 GA-BP 构建残差预测模型 P_Res.E, P_Res.F, P_Res.G
# ----------------------------
def create_bpnn(individual):
"""
根据遗传算法个体创建 BP 神经网络
individual = [n_neurons_layer1, n_neurons_layer2, learning_rate_log]
"""
n1 = int(individual[0])
n2 = int(individual[1])
lr = 10 ** individual[2] # 学习率对数空间搜索
model = tf.keras.Sequential()
model.add(Dense(n1, input_dim=4, activation='relu'))
if n2 > 0:
model.add(Dense(n2, activation='relu'))
model.add(Dense(1, activation='linear'))
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
model.compile(optimizer=optimizer, loss='mse', metrics=['mae'])
return model
# GA 参数设置
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)
def evaluate_ga_bp(individual, X_train, residual):
try:
model = create_bpnn(individual)
# 小批量训练观察性能
model.fit(X_train, residual, epochs=50, batch_size=8, verbose=0)
pred = model.predict(X_train, verbose=0).flatten()
r2 = r2_score(residual, pred)
# 避免过拟合惩罚?此处简化只看 R²
return (r2,)
except:
return (-1e6,)
print("开始使用GA优化BPNN结构以拟合残差...")
for target in output_cols:
residual = residuals[f'Res.{target}']
toolbox = base.Toolbox()
toolbox.register("n1", np.random.randint, 5, 64)
toolbox.register("n2", np.random.randint, 0, 32)
toolbox.register("lr", np.random.uniform, -5, -1) # log10(lr) ∈ [-5,-1] → lr ∈ [1e-5, 0.1]
toolbox.register("individual", tools.initCycle, creator.Individual,
(toolbox.n1, toolbox.n2, toolbox.lr), n=1)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("mate", tools.cxBlend, alpha=0.5)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("evaluate", evaluate_ga_bp, X_train=X_train, residual=residual)
pop = toolbox.population(n=20)
hof = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("max", np.max)
print(f"GA优化 P_Res.{target} ...")
pop, logbook = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=30,
stats=stats, halloffame=hof, verbose=True)
best_ind = hof[0]
print(f"P_Res.{target} 最佳参数: neurons=[{int(best_ind[0])},{int(best_ind[1])}], lr={10**best_ind[2]:.2e}")
# 重新训练最优模型
best_model = create_bpnn(best_ind)
best_model.fit(X_train, residual, epochs=100, batch_size=8, verbose=0)
# 保存模型
with open(f'models/P_Res.{target}.model.pkl', 'wb') as f:
pickle.dump({'model': best_model, 'params': best_ind}, f)
# 预测残差
pred_res = best_model.predict(X_train, verbose=0).flatten()
r2_res = r2_score(residual, pred_res)
print(f"P_Res.{target}.model R² on residual: {r2_res:.4f}")
# ----------------------------
# Step 8: 构建最终模型 Final.E/F/G.model = Ensemble + P_Res
# ----------------------------
print("构建最终模型 Final.E/F/G.model ...")
final_models = {}
for target in output_cols:
# 加载集成模型和残差模型
with open(f'models/Ensemble.{target}.model.pkl', 'rb') as f:
ens_model = pickle.load(f)
with open(f'models/P_Res.{target}.model.pkl', 'rb') as f:
pres_model = pickle.load(f)['model']
final_model = {
'ensemble': ens_model,
'pres': pres_model,
'scaler_Y': scaler_Y,
'target_idx': output_cols.index(target)
}
final_models[f'Final.{target}'] = final_model
# 保存最终模型
with open(f'models/Final.{target}.model.pkl', 'wb') as f:
pickle.dump(final_model, f)
# ----------------------------
# Step 9: 逆归一化并保存 Original.E/F/G.model
# ----------------------------
print("执行逆归一化,生成Original.E/F/G.model ...")
original_models = {}
for target in output_cols:
with open(f'models/Final.{target}.model.pkl', 'rb') as f:
final_model = pickle.load(f)
# 包装器:支持预测后自动逆归一化
class OriginalModel:
def __init__(self, final_model):
self.final_model = final_model
def predict(self, X_input):
# X_input shape: (N, 4)
X_lstm = X_input.reshape((X_input.shape[0], 1, 4))
target_idx = self.final_model['target_idx']
# Ensemble prediction (already scaled)
w = self.final_model['ensemble']['weights']
lstm_m = load_model(f'models/LSTM.{target}.model.h5')
svr_m = pickle.load(open(f'models/SVR.{target}.model.pkl', 'rb'))
krg_m = pickle.load(open(f'models/Kriging.{target}.model.pkl', 'rb'))
p1 = lstm_m.predict(X_lstm, verbose=0).flatten()
p2 = krg_m.predict(X_input)
p3 = svr_m.predict(X_input)
ensemble_pred_scaled = w[0]*p1 + w[1]*p2 + w[2]*p3
# Residual correction
pres_pred_scaled = self.final_model['pres'].predict(X_input, verbose=0).flatten()
# Sum in scaled space
final_scaled = ensemble_pred_scaled + pres_pred_scaled
# Inverse transform
N = len(final_scaled)
dummy = np.zeros((N, 3))
dummy[:, target_idx] = final_scaled
final_original = self.final_model['scaler_Y'].inverse_transform(dummy)
return final_original[:, target_idx]
wrapped_model = OriginalModel(final_model)
original_models[f'Original.{target}'] = wrapped_model
# 保存包装模型
with open(f'models/Original.{target}.model.pkl', 'wb') as f:
pickle.dump(wrapped_model, f)
# ----------------------------
# Step 10: 使用 Original.E/F/G.model 对样本数据进行预测并输出结果
# ----------------------------
print("开始对原始输入 ABCD 进行预测...")
# 准备输入数据(原始未归一化)
X_original = df[input_cols].values
results_final = {
'A': X_original[:, 0],
'B': X_original[:, 1],
'C': X_original[:, 2],
'D': X_original[:, 3],
}
for target in output_cols:
with open(f'models/Original.{target}.model.pkl', 'rb') as f:
model = pickle.load(f)
pred = model.predict(X_original)
results_final[f'Predicted_{target}'] = pred
# 转为 DataFrame 并保存
result_df = pd.DataFrame(results_final)
print(result_df.head())
result_df.to_csv('results/final_predictions.csv', index=False)
print("✅ 所有流程完成!结果已保存至 results/final_predictions.csv")
```
---
## 🔍 代码解释说明
| 步骤 | 内容 |
|------|------|
| **1. 数据读取** | 使用 `pandas.read_excel` 读取 Excel 文件中的 ABCDEFG 七列 |
| **2. 归一化** | 使用 `MinMaxScaler` 对输入输出分别归一化到 [0,1] |
| **3. 模型训练** | 分别训练 LSTM(Keras)、Kriging(GPR)、SVR(sklearn)三种模型,每种输出各三个,共9个 |
| **4. 模型评估与保存** | 使用 MSE 和 R² 评估,并以 `.pkl` 保存模型 |
| **5. 集成模型 + PSO优化权重** | 使用 `pyswarm` 实现 PSO,在满足 `w1+w2+w3=1`, `wi>0` 下最大化 R² |
| **6. 残差建模** | 计算集成模型误差,作为新目标 |
| **7. GA-BP 残差预测** | 使用 DEAP 库实现 GA 优化 BPNN 结构(层数、神经元、学习率),提升残差拟合精度 |
| **8. 最终模型合成** | `Final = Ensemble + P_Res` |
| **9. 逆归一化封装** | 构造 `Original.Model` 类,自动将预测结果还原为原始尺度 |
| **10. 预测输出** | 对原始输入数据进行预测,输出 CSV 表格 |
---
## ✅ 输出内容总结
- ✅ 9个基础模型 `.pkl` 文件(LSTM/E/F/G, Kriging/E/F/G, SVR/E/F/G)
- ✅ 3个集成模型 `.pkl`(Ensemble.E/F/G)
- ✅ 3个残差模型 `.pkl`(P_Res.E/F/G)
- ✅ 3个最终模型 `.pkl`(Final.E/F/G)
- ✅ 3个逆归一化模型 `.pkl`(Original.E/F/G)
- ✅ 每个模型的评估指标(CSV)
- ✅ 最终预测结果表(含原始输入和三项预测输出)
---
## ⚠️ 注意事项
1. **Excel 文件名必须是 `data.xlsx`**,位于当前目录。
2. **列名必须为 A, B, C, D, E, F, G**。
3. 若数据量较小,Kriging 可能不稳定,可用普通 RBF 替代。
4. LSTM 时间步设为 1,若有多时间序列可扩展。
5. GA-BP 中的 BPNN 是浅层网络,适合小数据。
---