FCN图像分割的mask标签使用transforms.ToTensor问题

本文介绍在PyTorch实现全卷积网络(FCN)时遇到的CrossEntropyLoss损失函数异常问题及解决方法。由于使用ToTensor()导致标签数据错误地被转换为0,从而使得训练过程中loss突然变为0且不再更新。解决方案是采用torch.as_tensor()进行转换,并通过unsqueeze()调整维度。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近在用pytorch写FCN时,将损失函数换成了CrossEntropyLoss,出现的问题是在第一个反向传播后,loss一下变成了0,之后再怎么训练loss都是0不再变化。而最初的loss看起来是一个正常的浮点数,很显然网络不可能经过一个epoch就找到了最优解。
经过仔细排查后,发现经过dataloader出来的标签里面的值都为0,正常应该会包含一些[0,calss_num)的整数标签,原来是读取的mask文件经过transforms.ToTensor()方法之后都变成了0,该方法的源码中也写到
because the input image is scaled to [0.0, 1.0], this transformation should not be used when transforming target image masks.
也就是说文件中的所有值在totensor之后最后映射在[0,1]这个区间,而我还转成了long类型(为了匹配loss函数的要求),所以最后都变成了0。
因此解决方法为使用torch.as_tensor(img)方法将读取的mask_img转成tensor,再使用torch.unsqueeze(img,0)提升一个维度,这样就能无缝替换上面ToTensor了。

我数据现在已下载好了在D:\86150\Documents目录下,然后我现在需要读懂数据集的相关说明,了解数据格式。在train集合上训练分割模型,在val集合上测试分割模型。 3.对PASCAL VOC 2012进行数据预处理。 4.以给出的代码框架为基础,实现分割网络,并且应用合理的损失函数;对PASCAL VOC 2012数据集进行训练,并利用mIoU、Dice、HD、Accuracy、Recall、F1 score指标验证分割模型性能。 5.对模型分割结果可视化,分析实验结果。 对于数据的处理,我要单独的把数据放一个文件夹中,然后在segment.py直接调用路径来调用数据集,给我数据预处理的过程以及补全我的segment.py文件import torch import torch.nn as nn import torch.optim as optim import torchvision.transforms as transforms from torch.utils.data import DataLoader import matplotlib.pyplot as plt from PIL import Image import numpy as np import os from sklearn.metrics import ( jaccard_score, # 用于计算IoU f1_score, # 用于计算F1 accuracy_score, # 用于计算Accuracy recall_score, # 用于计算Recall ) from scipy.spatial.distance import directed_hausdorff # 用于计算HD from medpy.metric.binary import dc # 用于计算Dice系数 class VOCDataset(torch.utils.data.Dataset): def __init__(self, root, transform=None): self.root = root self.transform = transform # 设置路径,这里只是示范 # 注意需要分别实现train和test的dataset self.img_dir = os.path.join(root, "JPEGImages") self.label_dir = os.path.join(root, "SegmentationClassVOC21") if transform is None: self.transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) def __len__(self): return len(self.label_files) def __getitem__(self, idx): img = Image.open(self.img_dir[idx]) label = Image.open(self.label_dir[idx]) return img, label # 请不要使用torchvision的VOCSegmentation,独立实现dataset以及dataloader def get_dataloader(batch_size=8): # 注意这里只有train的dataset,在测试时候请实现test的dataset transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor() ]) # 独立实现dataset的构建 dataset = VOCData
03-28
要求: 1.数据集准备:使用 Vessel 数据集,数据集总共只有29张标注好的图片,用其中27张训练,剩下2张用于前面训练好模型进行预测,将预测的分割图像显示出来。​ 2.模型构建:基于 PyTorch框架,构建一个基于ResNet的全卷积神经网络模型用于分割,在后面加上QKV注意力机制以提高分割准确率。要求在卷积层中合理设置卷积核大小、步长、填充等参数;在全连接层中设置合适的神经元数量。​ 3.模型训练:选择合适的损失函数(如交叉熵损失函数)和优化器(如 Adam 优化器),设置初始学习率,训练模型 epoch,每个 batch size。在训练过程中,记录每个 epoch 的训练损失率,并绘制训练损失随 epoch 变化的曲线。​ 模型评估:使用测试集对训练好的模型进行评估,并分析模型的性能表现,说明可能存在的过拟合或欠拟合问题及改进思路。 根据如下代码,如何修改可完成上面的要求 import os import json import torch import numpy as np from PIL import Image, ImageDraw from torch import nn, optim from torch.utils.data import Dataset, DataLoader from torchvision import transforms from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt # ========== 路径配置 ========== data_dir = r'C:\Users\Administrator\Desktop\vessel' test_files = ['Gentry_771_xa_010.tif.jpg', 'Gymnostoma nobile 0895.JPG.jpg'] # 指定测试文件 # ========== 自定义数据集类 ========== class SegmentationDataset(Dataset): def __init__(self, image_paths, img_transform=None, mask_transform=None): self.image_paths = [p for p in image_paths if os.path.basename(p) not in test_files] self.img_transform = img_transform self.mask_transform = mask_transform # 添加mask_transform参数 def __len__(self): # 必须实现__len__方法 return len(self.image_paths) # 返回实际有效的图像数量 def __getitem__(self, idx): # 读取图像 img_path = self.image_paths[idx] image = Image.open(img_path).convert('RGB') # 生成二值掩码(关键修改点) json_path = os.path.splitext(img_path)[0] + '.json' with open(json_path) as f: annotation = json.load(f) mask = Image.new('L', image.size, 0) # L模式(8位像素) for shape in annotation['shapes']: points = [(p[0], p[1]) for p in shape['points']] ImageDraw.Draw(mask).polygon(points, outline=255, fill=255) # 改为255 # 应用不同的预处理(关键修改点) if self.img_transform: image = self.img_transform(image) if self.mask_transform: mask = self.mask_transform(mask) mask = (mask > 0).float() # 确保转换为0/1 return image, mask # ========== UNet模型定义 ========== class UNet(nn.Module): def __init__(self, in_channels=3, out_channels=1): super().__init__() # 编码器 self.down1 = self.conv_block(in_channels, 64) self.down2 = self.conv_block(64, 128) self.down3 = self.conv_block(128, 256) # 解码器 self.up1 = self.conv_block(256 + 128, 128) self.up2 = self.conv_block(128 + 64, 64) self.final = nn.Conv2d(64, out_channels, kernel_size=1) def conv_block(self, in_c, out_c): return nn.Sequential( nn.Conv2d(in_c, out_c, 3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(out_c, out_c, 3, padding=1), nn.ReLU(inplace=True) ) def forward(self, x): # 编码过程 x1 = self.down1(x) x2 = self.down2(nn.MaxPool2d(2)(x1)) x3 = self.down3(nn.MaxPool2d(2)(x2)) # 解码过程 x = nn.Upsample(scale_factor=2)(x3) x = torch.cat([x, x2], dim=1) x = self.up1(x) x = nn.Upsample(scale_factor=2)(x) x = torch.cat([x, x1], dim=1) x = self.up2(x) return torch.sigmoid(self.final(x)) # ========== 训练配置 ========== device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') img_transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), ]) mask_transform = transforms.Compose([ transforms.Resize((256, 256), interpolation=transforms.InterpolationMode.NEAREST), transforms.ToTensor(), ]) # 获取所有图像路径 all_images = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.jpg')] # 创建数据集时传入不同的预处理 dataset = SegmentationDataset(all_images, img_transform=img_transform, mask_transform=mask_transform) train_loader = DataLoader(dataset, batch_size=4, shuffle=True) # 初始化模型和优化器 model = UNet().to(device) criterion = nn.BCELoss() optimizer = optim.Adam(model.parameters(), lr=0.001) # ========== 训练循环 ========== for epoch in range(20): model.train() for images, masks in train_loader: images = images.to(device) masks = masks.to(device) optimizer.zero_grad() outputs = model(images) loss = criterion(outputs, masks) loss.backward() optimizer.step() print(f'Epoch {epoch + 1}, Loss: {loss.item():.4f}') # ========== 模型预测 ========== model.eval() test_transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor() ]) for test_file in test_files: # 加载测试图像 img_path = os.path.join(data_dir, test_file) image = Image.open(img_path).convert('RGB') image_tensor = test_transform(image).unsqueeze(0).to(device) # 预测分割图 with torch.no_grad(): pred_mask = model(image_tensor).cpu().squeeze().numpy() # 可视化结果 plt.figure(figsize=(12, 6)) plt.subplot(1, 2, 1) plt.imshow(image) plt.title('Original Image') plt.subplot(1, 2, 2) plt.imshow(pred_mask > 0.5, cmap='gray') plt.title('Predicted Mask') plt.show()
06-18
详细解释代码: import os import json import torch import numpy as np from PIL import Image, ImageDraw from torch import nn, optim from torch.utils.data import Dataset, DataLoader from torchvision import models, transforms from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt # ========== 路径配置 ========== data_dir = r'C:\Users\Administrator\Desktop\vessel' test_files = ['Gentry_771_xa_010.tif.jpg', 'Gymnostoma nobile 0895.JPG.jpg'] # ========== 自定义数据集类 ========== class SegmentationDataset(Dataset): def __init__(self, image_paths, img_transform=None, mask_transform=None): self.image_paths = [p for p in image_paths if os.path.basename(p) not in test_files] self.img_transform = img_transform self.mask_transform = mask_transform def __len__(self): return len(self.image_paths) def __getitem__(self, idx): img_path = self.image_paths[idx] image = Image.open(img_path).convert('RGB') json_path = os.path.splitext(img_path)[0] + '.json' with open(json_path) as f: annotation = json.load(f) mask = Image.new('L', image.size, 0) for shape in annotation['shapes']: points = [(p[0], p[1]) for p in shape['points']] ImageDraw.Draw(mask).polygon(points, outline=255, fill=255) if self.img_transform: image = self.img_transform(image) if self.mask_transform: mask = self.mask_transform(mask) mask = (mask > 0).float() return image, mask # ========== QKV 注意力模块 ========== class QKVAttention(nn.Module): def __init__(self, in_channels): super().__init__() self.in_channels = in_channels self.inter_channels = in_channels // 8 self.query = nn.Conv2d(in_channels, self.inter_channels, kernel_size=1) self.key = nn.Conv2d(in_channels, self.inter_channels, kernel_size=1) self.value = nn.Conv2d(in_channels, in_channels, kernel_size=1) self.gamma = nn.Parameter(torch.zeros(1)) def forward(self, x): batch_size, C, H, W = x.size() # 生成Q, K, V q = self.query(x).view(batch_size, self.inter_channels, -1).permute(0, 2, 1) k = self.key(x).view(batch_size, self.inter_channels, -1) v = self.value(x).view(batch_size, C, -1) # 计算注意力分数 attn = torch.bmm(q, k) # [batch, HW, HW] attn = torch.softmax(attn, dim=-1) # 应用注意力 out = torch.bmm(v, attn.permute(0, 2, 1)) out = out.view(batch_size, C, H, W) return self.gamma * out + x # ========== 基于ResNet的FCN模型 ========== class ResNetFCN(nn.Module): def __init__(self, pretrained=True): super().__init__() # 加载预训练的ResNet18 resnet = models.resnet18(pretrained=pretrained) # 编码器部分 self.encoder1 = nn.Sequential( resnet.conv1, resnet.bn1, resnet.relu, resnet.maxpool ) self.encoder2 = resnet.layer1 self.encoder3 = resnet.layer2 self.encoder4 = resnet.layer3 self.encoder5 = resnet.layer4 # 注意力模块(添加到瓶颈层) self.attention = QKVAttention(512) # 解码器部分 self.up1 = self._up_block(512, 256) self.up2 = self._up_block(256, 128) self.up3 = self._up_block(128, 64) self.up4 = self._up_block(64, 64) # 最终输出层 self.final_conv = nn.Conv2d(64, 1, kernel_size=1) def _up_block(self, in_channels, out_channels): return nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=1), nn.ReLU(inplace=True), nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) ) def forward(self, x): # 编码过程 e1 = self.encoder1(x) # 64, 64, 64 e2 = self.encoder2(e1) # 64, 64, 64 e3 = self.encoder3(e2) # 128, 32, 32 e4 = self.encoder4(e3) # 256, 16, 16 e5 = self.encoder5(e4) # 512, 8, 8 # 应用QKV注意力 attn = self.attention(e5) # 解码过程 d1 = self.up1(attn) # 256, 16, 16 d2 = self.up2(d1) # 128, 32, 32 d3 = self.up3(d2) # 64, 64, 64 d4 = self.up4(d3) # 64, 128, 128 # 最终上采样到256x256 out = nn.functional.interpolate(d4, size=256, mode='bilinear', align_corners=True) out = self.final_conv(out) return torch.sigmoid(out) # ========== 训练配置 ========== device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 图像预处理(包含归一化) img_transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) mask_transform = transforms.Compose([ transforms.Resize((256, 256), interpolation=transforms.InterpolationMode.NEAREST), transforms.ToTensor(), ]) # 获取所有图像路径 all_images = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.jpg')] # 创建数据集 dataset = SegmentationDataset(all_images, img_transform=img_transform, mask_transform=mask_transform) train_loader = DataLoader(dataset, batch_size=4, shuffle=True) # 初始化模型 model = ResNetFCN(pretrained=True).to(device) criterion = nn.BCELoss() optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4) # 添加权重衰减防止过拟合 # ========== 训练循环 ========== train_losses = [] for epoch in range(50): # 增加epoch数量 model.train() epoch_loss = 0.0 for images, masks in train_loader: images = images.to(device) masks = masks.to(device) optimizer.zero_grad() outputs = model(images) loss = criterion(outputs, masks) loss.backward() optimizer.step() epoch_loss += loss.item() * images.size(0) epoch_loss /= len(dataset) train_losses.append(epoch_loss) print(f'Epoch {epoch + 1}, Loss: {epoch_loss:.4f}') # 绘制训练损失曲线 plt.figure(figsize=(10, 6)) plt.plot(range(1, len(train_losses)+1), train_losses, 'b-o') plt.title('Training Loss Curve') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.savefig('loss_curve.png') plt.show() # ========== 模型预测 ========== model.eval() test_transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # 反归一化函数用于显示 def denormalize(tensor): mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1) std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1) return tensor * std + mean for test_file in test_files: img_path = os.path.join(data_dir, test_file) image = Image.open(img_path).convert('RGB') image_tensor = test_transform(image).unsqueeze(0).to(device) with torch.no_grad(): pred_mask = model(image_tensor).cpu().squeeze().numpy() # 反归一化原始图像用于显示 orig_img = denormalize(image_tensor.cpu()).squeeze(0).permute(1, 2, 0).numpy() orig_img = np.clip(orig_img, 0, 1) plt.figure(figsize=(12, 6)) plt.subplot(1, 2, 1) plt.imshow(orig_img) plt.title('Original Image') plt.subplot(1, 2, 2) plt.imshow(pred_mask > 0.5, cmap='gray') plt.title('Predicted Mask') plt.savefig(f'prediction_{test_file}.png') plt.show() # ========== 模型评估 ========== print("模型评估:") print(f"最终训练损失: {train_losses[-1]:.4f}") print("过拟合/欠拟合分析:") if train_losses[-1] < 0.1: print(" - 模型可能过拟合:训练损失很低,但测试集只有2张图片,无法验证泛化能力") print(" - 改进思路:增加数据增强,添加Dropout层,使用更小的模型") elif train_losses[-1] > 0.3: print(" - 模型可能欠拟合:训练损失较高") print(" - 改进思路:增加训练epoch,提高模型复杂度,调整学习率") else: print(" - 训练损失在合理范围内,但由于测试集太小,难以准确评估模型性能")
最新发布
06-22
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值