Dropout通过将该层的输出部分置为0的操作来使得该输出对应的神经元在反向传播中梯度等于0,从而达到阻止该神经元更新的目的。
虽然我们在学习dropout时,说的是将某些神经元置为0,但是实际上是将该神经元的输出置为0,从而在反向传播中,该神经元的梯度就会为0(同时也会让经过dropout的输出整体乘以1/(1-p),使得该层输出的整体期望不变)达到阻止该神经元不更新的目的,提高模型的泛化能力。而在模型推理时,也就是model.eval()时,则会自动组织nn.Dropout层将会自动失效
演示代码:
import torch
import torch.nn as nn
import torch.optim as optim
# 设置随机种子以确保结果可重复
torch.manual_seed(42)
# 创建一个简单的全连接网络
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(4, 8) # 输入层 (4) -> 隐藏层 (8)
self.dropout = nn.Dropout(p=0.5) # Dropout 概率为 50%
self.fc2 = nn.Linear(8, 3) # 隐藏层 (8) -> 输出层 (3)
def forward(self, x):
x = torch.relu(self.fc1(x)) # 激活函数 ReLU
print("第一层的输出: ", x.detach().numpy())
x = self.dropout(x) # 应用 Dropout
print("第二层的输出(dropout层): ", x.detach().numpy())
x = self.fc2(x) # 输出层
return x
# 创建模型
model = SimpleNN()
# 定义输入数据
input_data = torch.tensor([[5.0, 6.0, 7.0, 8.0]], dtype=torch.float32)
print("\nInput data:\n", input_data.detach().numpy())
# 模型在训练模式下的行为
model.train()
output_train = model(input_data)
print("\nOutput in training mode (with Dropout):\n", output_train.detach().numpy())
# 模型在推理模式下的行为
model.eval()
output_eval = model(input_data)
print("\nOutput in evaluation mode (without Dropout):\n", output_eval.detach().numpy())
结果:
Input data:
[[5. 6. 7. 8.]]
第一层的输出: [[ 0. 0. 3.0867398 3.0755918 0. 11.119696
0. 0.30237788]]
第二层的输出(dropout层): [[0. 0. 6.1734796 6.1511836 0. 0.
0. 0.60475576]]
Output in training mode (with Dropout):
[[-2.809981 -0.24135023 -1.3747451 ]]
第一层的输出: [[ 0. 0. 3.0867398 3.0755918 0. 11.119696
0. 0.30237788]]
第二层的输出(dropout层): [[ 0. 0. 3.0867398 3.0755918 0. 11.119696
0. 0.30237788]]
Output in evaluation mode (without Dropout):
[[-4.7822514 0.7116643 0.10444203]]