11.How can Machine Learn Better? - Overfitting and Solution

本文详细阐述了机器学习中过拟合与欠拟合的概念,分析了过拟合产生的原因,并提供了五种避免过拟合的方法,包括使用简单模型、数据清理、数据提示、正则化及验证。

How can Machine Learn Better? - Overfitting and Solution

1. What is Overfitting?

上一节最后,我们提到了如果线性模型的模型复杂度太大的话,可能会引起Overfitting。同样的很明显会有Underfitting的情况。

那什么是Overfitting,Underfitting呢?

首先根据名字,Overfitting:over fitting,就是在fitting的时候太over了。我们用线性模型去分类/回归处理数据的过程就是一个fitting的过程,所以也就是说我们处理过头了。同理Underfitting就是处理不够到位。

那么什么时候才是处理过头呢?什么时候才是处理不到位呢?

就是在处理相对简单的问题的时候用了相对复杂的模型去处理。
就是在处理相对复杂的问题的时候用了相对简单的模型去处理。

我们用下面的例子来进行说明
1.首先例子如图一所示(这里用的是Ng的图,因为在林老师的ppt中没找到很好地图同时体现Underfit, good fit overfit)。左图是欠拟合(underfit),中间的图四好的拟合(good fit),右图是过度拟合(overfit)。单纯从拟合结果来看:明显左边和中间的图 Ein 比右图要大。但是从泛化好坏来看,显然左图和中间的图要比右图好。
假如我们考虑good fit分类出错的点为噪音点(noise),那么Overfit的模型就会受到了严重的干扰。

Underfit, Good Fit and Overfit

图一 Underfit, Good Fit and Overfit [1]

  1. 接着我们回头看之前总结的VC Dimension 的曲线, 如图二所示。

Learning Curve

图二 Learning Curve [1]

图中可以看到在VC Dimension变大时, Ein 变小, 但 Eout 先变小后变大,而过拟合和欠拟合的情况的主要区别就在于 Eout 的变化情况,具体解释如下:
- 过拟合(Overfitting)发生在VC Dimension较大时, Ein 太小, 但 Eout 太大(即VC Dimension 太大了) ,表示在训练样本上拟合做的很好, Ein 太小,但是过度了,使得泛化能力变差, Eout 很大
- 欠拟合(Underfitting)发生在在VC Dimension较小时, Ein 太大,同时 Eout 太大(即VC Dimension 太小了),表示在训练样本上拟合做不够好, Ein 太大,虽然泛化能力很强(即 Eout 也太大)

关于如何解决欠拟合的问题之前也讨论过: 从低到高不断地提高多项式次数,使得VC维提高,达到拟合的效果。
但过拟合的问题更为复杂,下面会更深入的探讨。

  1. 下面的图三,可以让我们更加直观的看到overfitting造成的问题

Cases of Overfitting

图三 Case of Overfitting [2]

从图中可以看到,Overfitting的 Ein 都比 good-fit的要低,但是 Eout 却很高(泛化能力差)。

总结起来有3个因数会导致Overfitting的发生:
- data size N 太小
- noise 太多
- VC Dimension太大



2. Dealing with Overfitting

上一节我们提出了overfitting并作了分析。总结出3个因数会导致overfitting,下面根据这3个因数,我们有5种方法帮助我们避免overfitting的发生。

  • 使用简单的模型(start from simple model),逐次增加模型的复杂度 - 防止 VC Dimension太大
  • 进行数据清理/裁剪(data cleaning/pruning) - 防止 noise 太多
  • 数据提示(data hinting) - 防止 data size N 太小
  • 正则化(regularization) - 防止 VC Dimension太大
  • 确认(validation) - 提取一部分的数据作为测试集,提前估计模型的泛化强度

下面我们分别介绍这5种方法,其中前三种方法比较简单,这里不做深入讨论,而 Regularization 和 Validation 较复杂,这里会用比较多的笔墨进行讨论。

1) Start from Simple Model

上一章中也提到过,由于VC Dimension太大的话,导致 Ein 变小的同时 Eout 却在变大。所以我们如果从d=1阶的模型开始debug,如果 Ein 不符合要求,那么我们增大d为2阶,然后在进行debug,以此类推,直到 Ein 符合我们要求位置,这个时候的 VC Dimension 不会很大,而且我们也得到泛化能力相对较强的模型。

2) Data Cleaning/Pruning

Data cleaning/pruning就是对训练数据集里label有明显错误的样本进行清理(data cleaning)或者裁剪(pruning)。data cleaning/pruning关键在于如何准确寻找label错误的点或者是noise的点。而处理的方法为
- 纠正,即数据清理(data cleaning)的方式处理该情况;
- 删除错误样本,即数据裁剪(data pruning)的方式处理。

处理措施很简单,但是发现样本是噪音或离群点却比较困难。

3) Data Hinting

Data hinting是针对N不够大的情况,通过data hinting的方法就可以对已知的样本进行简单的处理、变换,从而获得更多的样本。比如说:数字分类问题,可以对已知的数字图片进行轻微的平移或者旋转,从而得到更多的数据,达到扩大训练集的目的。这种通过data hinting得到的数据叫做:virtual examples。

需要注意的是,新获取的virtual examples可能不再是iid某个distribution。所以新构建的virtual examples要尽量合理,且是独立同分布的。

4) Regularization

Regularization(正规化)处理属于penalized方法的一种,通过正规化的处理来对原来的方程加上一个regularizer进行penalize,从而使得过渡复杂的模型,变得没那么复杂。

关于Regularization 的讨论看此链接:
12. 机器学习基石-How can Machine Learn Better? - Regularization

5) Validation

这个是目前最常用的方法之一,通过提前把一部分的数据拿出来作为测试集,因为测试集是随机取出来的,而将来实际的应用中,数据也大体和测试集出入不大,所以用这种方法,可以提前得到实际应用的时候,模型的错误 Eout 通过这个作为衡量模型是否合格的条件之一。

关于Validation 的讨论看此链接

13. 机器学习基石-How can Machine Learn Better? - Validation



Summary

1.首先介绍了Overfitting和Underfitting的概念。

2.接着我们着重分析Overfitting,总结了产生Overfitting的原因:

  • data size N 太小

  • noise 太多

  • VC Dimension太大

3.最后我们分析如何最大程度的避免Overfitting。在solution中.



Reference

[1] 机器学习基石(台湾大学-林轩田)\13\13 - 1 - What is Overfitting- (10-45)

[2] 机器学习基石(台湾大学-林轩田)\13\13 - 2 - The Role of Noise and Data Size (13-36)



开始训练: hidden_size=106, dropout=0.217, lr=0.1692, bs=125, wd=5.0e-04 Epoch 1: Train Loss=2.9063, Val Loss=0.3776, LR=1.69e-01 Epoch 2: Train Loss=0.7092, Val Loss=0.8457, LR=1.69e-01 Epoch 3: Train Loss=1.7772, Val Loss=0.2000, LR=1.69e-01 Epoch 4: Train Loss=0.9905, Val Loss=0.1919, LR=1.69e-01 Warning: Gradient explosion detected at Epoch 5, norm=24.28 Epoch 5: Train Loss=0.4239, Val Loss=0.2505, LR=1.69e-01 Warning: Gradient explosion detected at Epoch 6, norm=19.22 Epoch 6: Train Loss=1.3525, Val Loss=0.6657, LR=1.69e-01 Warning: Potential overfitting detected! Warning: Gradient explosion detected at Epoch 7, norm=28.31 Epoch 7: Train Loss=2.0367, Val Loss=2.5047, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 8: Train Loss=1.7812, Val Loss=8.5760, LR=1.69e-01 Epoch 9: Train Loss=0.5221, Val Loss=0.1499, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 10: Train Loss=0.4516, Val Loss=0.4852, LR=1.69e-01 Warning: Potential overfitting detected! Warning: Gradient explosion detected at Epoch 11, norm=18.76 Epoch 11: Train Loss=1.5965, Val Loss=3.0365, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 12: Train Loss=8.5612, Val Loss=0.7186, LR=1.69e-01 Epoch 13: Train Loss=1.2013, Val Loss=0.1332, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 14: Train Loss=0.5843, Val Loss=0.2373, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 15: Train Loss=0.8476, Val Loss=0.4644, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 16: Train Loss=0.8168, Val Loss=0.1607, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 17: Train Loss=0.3797, Val Loss=0.2155, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 18: Train Loss=0.8100, Val Loss=0.2240, LR=1.69e-01 Warning: Potential overfitting detected! Epoch 19: Train Loss=0.6434, Val Loss=1.2231, LR=1.69e-02 Epoch 20: Train Loss=0.2101, Val Loss=0.0963, LR=1.69e-02 Epoch 21: Train Loss=0.0455, Val Loss=0.0151, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 22: Train Loss=0.0350, Val Loss=0.0155, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 23: Train Loss=0.0373, Val Loss=0.0259, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 24: Train Loss=0.0347, Val Loss=0.0204, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 25: Train Loss=0.0383, Val Loss=0.0804, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 26: Train Loss=0.0398, Val Loss=0.0177, LR=1.69e-02 Epoch 27: Train Loss=0.0356, Val Loss=0.0129, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 28: Train Loss=0.0407, Val Loss=0.0689, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 29: Train Loss=0.0386, Val Loss=0.0319, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 30: Train Loss=0.0376, Val Loss=0.0605, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 31: Train Loss=0.0411, Val Loss=0.0178, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 32: Train Loss=0.0372, Val Loss=0.0454, LR=1.69e-02 Warning: Potential overfitting detected! Epoch 33: Train Loss=0.0415, Val Loss=0.0308, LR=1.69e-03 Epoch 34: Train Loss=0.0266, Val Loss=0.0118, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 35: Train Loss=0.0260, Val Loss=0.0139, LR=1.69e-03 Epoch 36: Train Loss=0.0255, Val Loss=0.0101, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 37: Train Loss=0.0250, Val Loss=0.0106, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 38: Train Loss=0.0257, Val Loss=0.0159, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 39: Train Loss=0.0237, Val Loss=0.0102, LR=1.69e-03 Epoch 40: Train Loss=0.0246, Val Loss=0.0083, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 41: Train Loss=0.0237, Val Loss=0.0094, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 42: Train Loss=0.0259, Val Loss=0.0148, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 43: Train Loss=0.0259, Val Loss=0.0141, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 44: Train Loss=0.0240, Val Loss=0.0157, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 45: Train Loss=0.0237, Val Loss=0.0084, LR=1.69e-03 Warning: Potential overfitting detected! Epoch 46: Train Loss=0.0240, Val Loss=0.0198, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 47: Train Loss=0.0216, Val Loss=0.0098, LR=1.69e-04 Epoch 48: Train Loss=0.0212, Val Loss=0.0081, LR=1.69e-04 Epoch 49: Train Loss=0.0211, Val Loss=0.0077, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 50: Train Loss=0.0210, Val Loss=0.0078, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 51: Train Loss=0.0215, Val Loss=0.0077, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 52: Train Loss=0.0218, Val Loss=0.0079, LR=1.69e-04 Epoch 53: Train Loss=0.0209, Val Loss=0.0077, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 54: Train Loss=0.0209, Val Loss=0.0078, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 55: Train Loss=0.0208, Val Loss=0.0079, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 56: Train Loss=0.0211, Val Loss=0.0080, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 57: Train Loss=0.0216, Val Loss=0.0082, LR=1.69e-04 Epoch 58: Train Loss=0.0207, Val Loss=0.0077, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 59: Train Loss=0.0207, Val Loss=0.0085, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 60: Train Loss=0.0209, Val Loss=0.0082, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 61: Train Loss=0.0210, Val Loss=0.0077, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 62: Train Loss=0.0213, Val Loss=0.0082, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 63: Train Loss=0.0205, Val Loss=0.0081, LR=1.69e-04 Warning: Potential overfitting detected! Epoch 64: Train Loss=0.0208, Val Loss=0.0078, LR=1.69e-05 Warning: Potential overfitting detected! Epoch 65: Train Loss=0.0200, Val Loss=0.0077, LR=1.69e-05 Warning: Potential overfitting detected! Epoch 66: Train Loss=0.0201, Val Loss=0.0082, LR=1.69e-05 Warning: Potential overfitting detected! Epoch 67: Train Loss=0.0206, Val Loss=0.0081, LR=1.69e-05 早停触发于第67轮 === 最终测试结果 === MSE: 0.007725 RMSE: 0.087893 R² Score: 0.9889
最新发布
06-13
### 回答问题 #### 优化模型训练过程中的梯度爆炸和过拟合问题 从输出日志中可以看出,模型在训练过程中存在明显的梯度爆炸和过拟合现象。以下是对问题的分析及解决方案: 1. **梯度爆炸**: - 在多个 epoch 中检测到梯度范数过大(如 `norm=24.28`、`norm=28.31` 等),这表明梯度爆炸问题显著。 - 梯度爆炸会导致权重更新幅度过大,从而使模型参数偏离正常范围。 2. **过拟合**: - 训练集损失持续下降,但验证集损失在某些 epoch 中上升,例如: ``` Epoch 3: Train Loss=1.7772, Val Loss=0.2000 Epoch 4: Train Loss=0.9905, Val Loss=0.1919 Epoch 5: Train Loss=0.4239, Val Loss=0.2505 ``` 验证集损失在第 5 轮开始上升,说明模型可能已经过拟合。 --- ### 给出解释 以下是针对上述问题的改进代码示例,包含更精细的梯度裁剪、正则化方法以及早停机制: ```python import torch import torch.nn as nn import torch.optim as optim import numpy as np import matplotlib.pyplot as plt # 定义一个简单的LSTM模型 class OilTempLSTM(nn.Module): def __init__(self, input_size, hidden_size, dropout_rate): super(OilTempLSTM, self).__init__() self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True) self.dropout = nn.Dropout(dropout_rate) self.fc = nn.Linear(hidden_size, 1) def forward(self, x): out, _ = self.lstm(x) out = self.dropout(out[:, -1, :]) # 取最后一个时间步的输出 out = self.fc(out) return out def test_optimized_params(X_train, y_train, X_test, y_test, best_params, device='cuda'): """ 测试优化后的超参数组合,并加入梯度爆炸和过拟合检测 """ hidden_size = int(round(best_params[0])) dropout_rate = float(best_params[1]) lr = float(best_params[2]) batch_size = int(round(best_params[3])) weight_decay = float(best_params[4]) # 数据转换为Tensor train_dataset = TensorDataset(torch.FloatTensor(X_train), torch.FloatTensor(y_train)) test_dataset = TensorDataset(torch.FloatTensor(X_test), torch.FloatTensor(y_test)) train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # 初始化模型 model = OilTempLSTM( input_size=X_train.shape[2], hidden_size=hidden_size, dropout_rate=dropout_rate ).to(device) criterion = nn.MSELoss() optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay) scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=5) train_losses = [] val_losses = [] grad_norms = [] # 记录梯度范数 best_loss = np.inf best_epoch = 0 print(f"\n开始训练: hidden_size={hidden_size}, dropout={dropout_rate:.3f}, lr={lr:.4f}, bs={batch_size}, wd={weight_decay:.1e}") for epoch in range(100): model.train() epoch_loss = 0 for X_batch, y_batch in train_loader: X_batch, y_batch = X_batch.to(device), y_batch.to(device) optimizer.zero_grad() outputs = model(X_batch).squeeze(-1) # 去掉多余的维度 loss = criterion(outputs, y_batch) loss.backward() # 梯度裁剪并记录梯度范数 total_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) grad_norms.append(total_norm.item()) optimizer.step() epoch_loss += loss.item() # 验证损失 model.eval() val_loss = 0 with torch.no_grad(): for X_batch, y_batch in test_loader: X_batch, y_batch = X_batch.to(device), y_batch.to(device) outputs = model(X_batch).squeeze(-1) val_loss += criterion(outputs, y_batch).item() train_loss = epoch_loss / len(train_loader) val_loss = val_loss / len(test_loader) train_losses.append(train_loss) val_losses.append(val_loss) # 学习率调度 scheduler.step(val_loss) # 保存最佳模型 if val_loss < best_loss: best_loss = val_loss best_epoch = epoch torch.save(model.state_dict(), 'best_model.pth') # 早停判断 if epoch - best_epoch >= 10: print(f"早停触发于第{epoch}轮") break # 过拟合检测 if epoch > 5 and val_loss > min(val_losses[:-1]): print("Warning: Potential overfitting detected!") # 梯度爆炸检测 if total_norm > 10: # 假设梯度范数超过10为异常 print(f"Warning: Gradient explosion detected at Epoch {epoch+1}, norm={total_norm:.2f}") print(f"Epoch {epoch+1}: Train Loss={train_loss:.4f}, Val Loss={val_loss:.4f}, LR={optimizer.param_groups[0]['lr']:.2e}") # 可视化损失曲线 plt.figure(figsize=(10, 6)) plt.plot(train_losses, label='Train Loss') plt.plot(val_losses, label='Validation Loss') plt.axvline(best_epoch, color='r', linestyle='--', label='Best Epoch') plt.title('Training and Validation Loss Over Epochs') plt.xlabel('Epoch') plt.ylabel('Loss') plt.legend() plt.grid(True) plt.show() # 加载最佳模型进行最终测试 model.load_state_dict(torch.load('best_model.pth')) model.eval() y_pred, y_true = [], [] with torch.no_grad(): for X_batch, y_batch in test_loader: X_batch = X_batch.to(device) outputs = model(X_batch).cpu().numpy() y_pred.extend(outputs) y_true.extend(y_batch.numpy()) mse = mean_squared_error(y_true, y_pred) rmse = np.sqrt(mse) r2 = r2_score(y_true, y_pred) print("\n=== 最终测试结果 ===") print(f"MSE: {mse:.6f}") print(f"RMSE: {rmse:.6f}") print(f"R² Score: {r2:.4f}") if __name__ == "__main__": device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 加载数据(使用与优化时相同的预处理) X_train, y_train, X_test, y_test, _, _ = load_data( 'C:/Users/Hyl/Desktop/undeadd.xlsx', target_col='齿轮箱油池温度平均值', feature_cols=[ '齿轮箱高速轴轴承(风轮侧)温度平均值', '齿轮箱进油口温度平均值', '风向10s平均值', '齿轮箱入口压力平均值', '齿轮箱油泵压力平均值', '齿轮箱油池温度平均值', '发电机V相定子绕组温度平均值', '主轴轴承2(后轴承)温度平均值', '发电机前轴承温度平均值', '偏航系统压力平均值', '10m平均风速平均值', '发电机后轴承温度平均值', '环境温度平均值', '实际功率平均值', '累计应发电量平均值' ] ) optimized_params = [105.9431, 0.2173, 0.1692, 125, 0.0005] test_optimized_params(X_train, y_train, X_test, y_test, optimized_params, device) ``` --- ###
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值