pytorch 使用GPU报错 ->RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB

博主在更新PyTorch到1.1.0版本后,设置bitch_size为64时出现CUDA内存溢出,改小到2仍报错,使用复杂的GoogleNet模型时出现该问题。按参考修改代码后又出现新报错,最终通过设置x_train的requires_grad=True解决。

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 10.92 GiB total capacity; 9.79 GiB already allocated; 539.44 MiB free; 10.28 MiB cached)

本人的pytorch的版本是1.1.0,这个是我pytorch版本更新后,我已开的bitch_size=64,出现内存不够,但是我把bitch_size改小,bitch_size=2,依然出现内存溢出报错。本文使用的模型是较为复杂的GoogleNet(使用简单的VGG是没报错),我的源码是:

# 判断gpu是否可用
if torch.cuda.is_available():
    device = 'cuda'
else:
    device = 'cpu'
# device = 'cpu'
#device = 'cpu'
device = torch.device(device) 

cnn = GoogLeNet().to(device)
optimizer = Adam(cnn.parameters(), lr=0.001, betas=(0.9, 0.999))  # 选用AdamOptimizer
loss_fn = nn.SmoothL1Loss()  # 定义损失函数

# 训练并评估模型

data = Dataset(epochs=args.EPOCHS, batch=args.BATCH, val_batch=args.BATCH)
model = Model(data)

lowest_loss = 1e5
for i in range(data.get_step()):
    cnn.train()
    x_train, y_train = data.next_train_batch()
    x_test, y_test = data.next_validation_batch()

    x_train = torch.from_numpy(x_train)
    y_train = torch.from_numpy(y_train)
    x_train = x_train.float().to(device)
    y_train = y_train.float().to(device)

    outputs = cnn(x_train) # 问题出错点

    optimizer.zero_grad()
    #print(x_train.shape,outputs.shape,y_train.shape)
    loss = loss_fn(outputs, y_train)
    loss.backward()
    optimizer.step()
    print(loss)

解决:

out of memory解决参考来源参考链接

把outputs = cnn(x_train)改为

    with torch.no_grad():
        outputs = cnn(x_train)

改完后,但是又爆出新的bugs

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
然后又参考这个博客参考链接

修改x_train和x_train的requires_grad=True就能解决

解决后的代码为:

# 判断gpu是否可用
if torch.cuda.is_available():
    device = 'cuda'
else:
    device = 'cpu'
# device = 'cpu'
#device = 'cpu'
device = torch.device(device) 

# cnn = Net().to(device)
cnn = GoogLeNet().to(device)
optimizer = Adam(cnn.parameters(), lr=0.001, betas=(0.9, 0.999))  # 选用AdamOptimizer
loss_fn = nn.SmoothL1Loss()  # 定义损失函数

# 训练并评估模型

data = Dataset(epochs=args.EPOCHS, batch=args.BATCH, val_batch=args.BATCH)
model = Model(data)

lowest_loss = 1e5
for i in range(data.get_step()):
    cnn.train()
    x_train, y_train = data.next_train_batch()
    x_test, y_test = data.next_validation_batch()

    x_train = torch.from_numpy(x_train)
    y_train = torch.from_numpy(y_train)
    x_train = x_train.float().to(device)
    y_train = y_train.float().to(device)

    x_train = Variable(x_train, requires_grad=True)
    y_train = Variable(y_train, requires_grad=True)
    with torch.no_grad():
        outputs = cnn(x_train)

    optimizer.zero_grad()
    #print(x_train.shape,outputs.shape,y_train.shape)
    loss = loss_fn(outputs, y_train)
    loss.backward()
    optimizer.step()
    print(loss)

 

这个错误是由于CUDA内存不足导致的。根据引用\[1\]和引用\[2\]的信息,你的GPU总容量为4.00 GiB10.76 GiB,但已经分配了2.34 GiB1.82 GiB的内存,剩余的内存不足以分配14.00 MiB的内存。这可能是由于你的模型或数据的规模过大,导致内存不足。你可以尝试减小batch size或者使用更小的模型来减少内存的使用。另外,你还可以尝试设置max_split_size_mb参数来避免内存碎片化。关于内存管理和PYTORCH_CUDA_ALLOC_CONF的更多信息,请参考PyTorch的文档。 此外,根据引用\[3\]的信息,你还可以通过手动杀死占用GPU内存的进程来释放内存。你可以使用kill命令加上进程的PID来终止该进程,例如kill -9 31272。 综上所述,你可以通过减小batch size、使用更小的模型、设置max_split_size_mb参数或手动杀死占用内存的进程来解决CUDA内存不足的问题。 #### 引用[.reference_title] - *1* [已解决yolov5报错RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB](https://blog.csdn.net/Code_and516/article/details/129798540)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* *3* [解决RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB](https://blog.csdn.net/qq_43733107/article/details/126876755)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值