关于deeplabv3+ 使用自定义数据集训练出现问题2

最新推荐文章于 2024-10-31 18:10:17 发布

D_galaxy

最新推荐文章于 2024-10-31 18:10:17 发布

阅读量824

点赞数 2

分类专栏： code 文章标签：深度学习 pytorch cv

本文链接：https://blog.youkuaiyun.com/u010358801/article/details/112169555

版权

code 专栏收录该内容

6 篇文章

订阅专栏

关于deeplabv3+ 使用自定义数据集训练出现问题2

前言

见：原创关于deeplabv3+ 使用自定义数据集训练出现的问题1

1 新出现的问题

1.1 问题简述

首先由于我使用的数据处理是从dataloaders/datasets/pascal.py 复制过来的，所以很多代码都是从这里变化过来的。正因为如此，而我们的数据集瓷瓦裂痕的数据集，MASK 图片颜色就两种，背景是RGB：(0,0,0)，识别项是RGB(255,255,255)

从voc 标注的图片，它是用PIL.Imgae 下的类打开图片，保存图片的模式’P’，该模式自行必应。简单来说，每个颜色对应一个色盘

例如：（0， 0， 0）-----1

（255，255，255）-----2

每个颜色对应的一个调色盘。

1.2 VOC 的数据集中对应调色盘

VOC 的数据集中对应调色盘大概是如此的

	array([[  0,  0,  0],
 [128,  0,  0],
 [  0, 128,  0],
 [128, 128,  0],
 [  0,  0, 128],
 [128,  0, 128],
 [  0, 128, 128],
 [128, 128, 128],
 [ 64,  0,  0],
 [192,  0,  0],
 [ 64, 128,  0],
 [192, 128,  0],
 [ 64,  0, 128],
 [192,  0, 128],
 [ 64, 128, 128],
 [192, 128, 128],
 [  0,  64,  0],
 [128,  64,  0],
 [  0, 192,  0],
 [128, 192,  0],
 [  0,  64, 128],
 [128,  64, 128],
 [  0, 192, 128],
 [128, 192, 128],
 [ 64,  64,  0],
 [192,  64,  0],
 [ 64, 192,  0],
 [192, 192,  0],
 [ 64,  64, 128],
 [192,  64, 128],
 [ 64, 192, 128],
 [192, 192, 128],
 [  0,  0,  64],
 [128,  0,  64],
 [  0, 128,  64],
 [128, 128,  64],
 [  0,  0, 192],
 [128,  0, 192],
 [  0, 128, 192],
 [128, 128, 192],
 [ 64,  0,  64],
 [192,  0,  64],
 [ 64, 128,  64],
 [192, 128,  64],
 [ 64,  0, 192],
 [192,  0, 192],
 [ 64, 128, 192],
 [192, 128, 192],
 [  0,  64,  64],
 [128,  64,  64],
 [  0, 192,  64],
 [128, 192,  64],
 [  0,  64, 192],
 [128,  64, 192],
 [  0, 192, 192],
 [128, 192, 192],
 [ 64,  64,  64],
 [192,  64,  64],
 [ 64, 192,  64],
 [192, 192,  64],
 [ 64,  64, 192],
 [192,  64, 192],
 [ 64, 192, 192],
 [192, 192, 192],
 [ 32,  0,  0],
 [160,  0,  0],
 [ 32, 128,  0],
 [160, 128,  0],
 [ 32,  0, 128],
 [160,  0, 128],
 [ 32, 128, 128],
 [160, 128, 128],
 [ 96,  0,  0],
 [224,  0,  0],
 [ 96, 128,  0],
 [224, 128,  0],
 [ 96,  0, 128],
 [224,  0, 128],
 [ 96, 128, 128],
 [224, 128, 128],
 [ 32,  64,  0],
 [160,  64,  0],
 [ 32, 192,  0],
 [160, 192,  0],
 [ 32,  64, 128],
 [160,  64, 128],
 [ 32, 192, 128],
 [160, 192, 128],
 [ 96,  64,  0],
 [224,  64,  0],
 [ 96, 192,  0],
 [224, 192,  0],
 [ 96,  64, 128],
 [224,  64, 128],
 [ 96, 192, 128],
 [224, 192, 128],
 [ 32,  0,  64],
 [160,  0,  64],
 [ 32, 128,  64],
 [160, 128,  64],
 [ 32,  0, 192],
 [160,  0, 192],
 [ 32, 128, 192],
 [160, 128, 192],
 [ 96,  0,  64],
 [224,  0,  64],
 [ 96, 128,  64],
 [224, 128,  64],
 [ 96,  0, 192],
 [224,  0, 192],
 [ 96, 128, 192],
 [224, 128, 192],
 [ 32,  64,  64],
 [160,  64,  64],
 [ 32, 192,  64],
 [160, 192,  64],
 [ 32,  64, 192],
 [160,  64, 192],
 [ 32, 192, 192],
 [160, 192, 192],
 [ 96,  64,  64],
 [224,  64,  64],
 [ 96, 192,  64],
 [224, 192,  64],
 [ 96,  64, 192],
 [224,  64, 192],
 [ 96, 192, 192],
 [224, 192, 192],
 [  0,  32,  0],
 [128,  32,  0],
 [  0, 160,  0],
 [128, 160,  0],
 [  0,  32, 128],
 [128,  32, 128],
 [  0, 160, 128],
 [128, 160, 128],
 [ 64,  32,  0],
 [192,  32,  0],
 [ 64, 160,  0],
 [192, 160,  0],
 [ 64,  32, 128],
 [192,  32, 128],
 [ 64, 160, 128],
 [192, 160, 128],
 [  0,  96,  0],
 [128,  96,  0],
 [  0, 224,  0],
 [128, 224,  0],
 [  0,  96, 128],
 [128,  96, 128],
 [  0, 224, 128],
 [128, 224, 128],
 [ 64,  96,  0],
 [192,  96,  0],
 [ 64, 224,  0],
 [192, 224,  0],
 [ 64,  96, 128],
 [192,  96, 128],
 [ 64, 224, 128],
 [192, 224, 128],
 [  0,  32,  64],
 [128,  32,  64],
 [  0, 160,  64],
 [128, 160,  64],
 [  0,  32, 192],
 [128,  32, 192],
 [  0, 160, 192],
 [128, 160, 192],
 [ 64,  32,  64],
 [192,  32,  64],
 [ 64, 160,  64],
 [192, 160,  64],
 [ 64,  32, 192],
 [192,  32, 192],
 [ 64, 160, 192],
 [192, 160, 192],
 [  0,  96,  64],
 [128,  96,  64],
 [  0, 224,  64],
 [128, 224,  64],
 [  0,  96, 192],
 [128,  96, 192],
 [  0, 224, 192],
 [128, 224, 192],
 [ 64,  96,  64],
 [192,  96,  64],
 [ 64, 224,  64],
 [192, 224,  64],
 [ 64,  96, 192],
 [192,  96, 192],
 [ 64, 224, 192],
 [192, 224, 192],
 [ 32,  32,  0],
 [160,  32,  0],
 [ 32, 160,  0],
 [160, 160,  0],
 [ 32,  32, 128],
 [160,  32, 128],
 [ 32, 160, 128],
 [160, 160, 128],
 [ 96,  32,  0],
 [224,  32,  0],
 [ 96, 160,  0],
 [224, 160,  0],
 [ 96,  32, 128],
 [224,  32, 128],
 [ 96, 160, 128],
 [224, 160, 128],
 [ 32,  96,  0],
 [160,  96,  0],
 [ 32, 224,  0],
 [160, 224,  0],
 [ 32,  96, 128],
 [160,  96, 128],
 [ 32, 224, 128],
 [160, 224, 128],
 [ 96,  96,  0],
 [224,  96,  0],
 [ 96, 224,  0],
 [224, 224,  0],
 [ 96,  96, 128],
 [224,  96, 128],
 [ 96, 224, 128],
 [224, 224, 128],
 [ 32,  32,  64],
 [160,  32,  64],
 [ 32, 160,  64],
 [160, 160,  64],
 [ 32,  32, 192],
 [160,  32, 192],
 [ 32, 160, 192],
 [160, 160, 192],
 [ 96,  32,  64],
 [224,  32,  64],
 [ 96, 160,  64],
 [224, 160,  64],
 [ 96,  32, 192],
 [224,  32, 192],
 [ 96, 160, 192],
 [224, 160, 192],
 [ 32,  96,  64],
 [160,  96,  64],
 [ 32, 224,  64],
 [160, 224,  64],
 [ 32,  96, 192],
 [160,  96, 192],
 [ 32, 224, 192],
 [160, 224, 192],
 [ 96,  96,  64],
 [224,  96,  64],
 [ 96, 224,  64],
 [224, 224,  64],
 [ 96,  96, 192],
 [224,  96, 192],
 [ 96, 224, 192],
 [224, 224, 192]], dtype=uint8)

1.3 想要的调色盘

因此，从这里我原本想法是这样，让第一个类对应的是[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-A81SYSMl-1609727041806)(/home/dlh/Pictures/Screenshot from 2021-01-04 09-20-39.png)]

结果就是

[[  0   0   0]      <--- 背景颜色                          ----0
 [128   0   0]      <--- 标注的第一个类，对应调色盘上的1      -----1   
 [  2   2   2]      <--- 标注的第二个类，对应调色盘上的2      -----2    
 [  3   3   3]		<--- 标注的第三个类，对应调色盘上的3      -----3   
 [  4   4   4] 		<--- 标注的第四个类，对应调色盘上的4      -----4   
 [  5   5   5] 		<--- 标注的第五个类，对应调色盘上的5      -----5   
 [  6   6   6] 		
 ...
 ...
 [253 253 253]
 [254 254 254]
 [224 224 192]]		<--- 忽略的颜色					    -----255

但是无论如何做，似乎如何也做不到我想象的情况！

我做的改动：

从原图上改动

import numpy as np
from PIL import Image

# 打开图像
im = Image.open('a.png')

# 获取调色盘
palette = np.array(im.getpalette(),dtype=np.uint8).reshape((256,3))

# 将2号对应的颜色深拷贝过来
temp = copy.deepcopy(palette[2])

# 改变调色盘上的 palette[2]Value [224 224 192]  -->  [2 2 2]  
palette[2] = (2,2,2)

#改变调色盘上的[255]Value [255 255 255] -->[224 224 192]
palette[255] = temp

# 
aa = palette.flatten().tolist()

#调色盘上方回去
im.putpalette(aa)

结果就是

[[  0   0   0]      <--- 背景颜色            ----0
 [128   0   0]      <--- 标注的第一个类      -----1
 [  2   2   2]      <--- 这是忽略的颜色      -----2   变成了【2 2 2】 # 那我要如果和让 [224 224 192]  对应成  调色盘上的255呢？目前没有答案
 [  3   3   3]
 [  4   4   4]
 [  5   5   5]
 [  6   6   6]
 ...
 ...
 [253 253 253]
 [254 254 254]
 [224 224 192]]

然后，我只能妥协了：没时间高这么多了，没有了忽略颜色，直接全部是我需要的。并且每个类都是标注的1，如果成为我想的那个数据集估计要重新用标注软件重新标注，费时腓力。

1.4 最终的调色盘

最终的结果就是

# 这是第一个类的颜色
[[  0   0   0]      <--- 背景颜色            ----0
 [128   0   0]      <--- 标注的类           -----1  # 红色
 [  2   2   2]      
 ...
 ...
 [255 255 255]]
 
 # 这是第二个类的颜色
[[  0   0   0]      <--- 背景颜色            ----0
 [  0  128  0]      <--- 标注的类     		-----1  # 绿色
 [  2   2   2]      
 ...
 ...
 [255 255 255]]

2 代码块的修改

2.1 pascal.py copy->mydata.py

这里有一个地方恶心到我了，因为有时候用图片测试载入，弄者弄着就全黑了，但原图是有mask的标记了，我也不知道为啥。我找了几天，问题很简单，就是在对图片和mask 处理的时候，它有一个随即裁减的功能，然后就是把mask给剪掉了。。。剪掉了。。。裂开…

所以有了一个 custom_transforms_myda.py。改的地方挺多的

class FixedResize(object):
    def __init__(self, size):
        self.size = (size, size)  # size: (h, w)

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        assert img.size == mask.size

        img = img.resize(self.size, Image.BILINEAR)
        mask = mask.resize(self.size, Image.NEAREST)

        return {'image': img,
                'label': mask,
                'classid': idclass}    #@ <-- 每个都需要返回一个id  用来识别1号调色盘  属于那个类

整个 custom_transforms_myda.py 在这里

import torch
import random
import numpy as np

from PIL import Image, ImageOps, ImageFilter
import cv2
import matplotlib.pyplot as plt

class Normalize(object):
    """Normalize a tensor image with mean and standard deviation.
    Args:
        mean (tuple): means for each channel.
        std (tuple): standard deviations for each channel.
    """
    def __init__(self, mean=(0., 0., 0.), std=(1., 1., 1.)):
        self.mean = mean
        self.std = std

    def __call__(self, sample):
        img = sample['image']
        maska = sample['label']
        idclass = sample['classid']
        img = np.array(img).astype(np.float32)
        mask = np.array(maska).astype(np.float32)
        img /= 255.0
        img -= self.mean
        img /= self.std

        return {'image': img,
                'label': mask,
                'classid': idclass}


class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        img = np.array(img).astype(np.float32).transpose((2, 0, 1))
        mask = np.array(mask).astype(np.float32)

        img = torch.from_numpy(img).float()
        mask = torch.from_numpy(mask).float()

        return {'image': img,
                'label': mask,
                'classid': idclass}


class RandomHorizontalFlip(object):
    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']

        idclass = sample['classid']
        if random.random() < 0.5:
            img = img.transpose(Image.FLIP_LEFT_RIGHT)
            mask = mask.transpose(Image.FLIP_LEFT_RIGHT)

        return {'image': img,
                'label': mask,
                'classid': idclass}


class RandomRotate(object):
    def __init__(self, degree):
        self.degree = degree

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']

        rotate_degree = random.uniform(-1*self.degree, self.degree)
        img = img.rotate(rotate_degree, Image.BILINEAR)
        mask = mask.rotate(rotate_degree, Image.NEAREST)


        return {'image': img,
                'label': mask,
             'classid': idclass}

class RandomGaussianBlur(object):
    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        if random.random() < 0.5:
            img = img.filter(ImageFilter.GaussianBlur(
                radius=random.random()))

        return {'image': img,
                'label': mask,
                'classid': idclass}


class RandomScaleCrop(object):
    def __init__(self, base_size, crop_size, fill=0):
        self.base_size = base_size
        self.crop_size = crop_size
        self.fill = fill

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        # random scale (short edge)
        short_size = random.randint(int(self.base_size * 0.5), int(self.base_size * 2.0))
        w, h = img.size
        if h > w:
            ow = short_size
            oh = int(1.0 * h * ow / w)
        else:
            oh = short_size
            ow = int(1.0 * w * oh / h)
        img = img.resize((ow, oh), Image.BILINEAR)
        mask = mask.resize((ow, oh), Image.NEAREST)
        # pad crop
        if short_size < self.crop_size:
            padh = self.crop_size - oh if oh < self.crop_size else 0
            padw = self.crop_size - ow if ow < self.crop_size else 0
            img = ImageOps.expand(img, border=(0, 0, padw, padh), fill=0)
            mask = ImageOps.expand(mask, border=(0, 0, padw, padh), fill=self.fill)
        # random crop crop_size
        w, h = img.size
        x1 = random.randint(0, w - self.crop_size)
        y1 = random.randint(0, h - self.crop_size)
        img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
        mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))

        return {'image': img,
                'label': mask,
                'classid': idclass}


class FixScaleCrop(object):
    def __init__(self, crop_size):
        self.crop_size = crop_size
        self.classed = [(128,0,0), (0,128,0), (128,128,0), (0,0,128),(128,0,128),(0,0,0)]
    def letterbox(self, img, new_shape=(513, 513), color=(0, 0, 0),id =0, isNeedToConvert = False,auto=True, scaleFill=False, scaleup=True):
        # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
        # w, h = img.size

        img = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)

        # shape = (w,h)
        shape = img.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        # Scale ratio (new / old)
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        if not scaleup:  # only scale down, do not scale up (for better test mAP)
            r = min(r, 1.0)

        # Compute padding
        ratio = r, r  # width, height ratios
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
        if auto:  # minimum rectangle
            dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding
        elif scaleFill:  # stretch
            dw, dh = 0.0, 0.0
            new_unpad = (new_shape[1], new_shape[0])
            ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

        dw /= 2  # divide padding into 2 sides
        dh /= 2
        #
        if shape != new_unpad:  # resize
            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))

        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
        # print(img.shape)
        if isNeedToConvert:
            new_img = Image.new('P', new_shape, (0, 0, 0))
            for h in range(0, new_shape[0]):
                for j in range(0, new_shape[1]):
                    (b, g, r) = img[h, j]
                    if (b, g, r) == (1, 1, 1):
                        new_img.putpixel((j, h), self.classed[id])
                        #img[h, j] = self.classed[id]


            return new_img, ratio, (dw, dh)
        else:
            image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), mode='RGB')


        return image, ratio, (dw, dh)



    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']

        mask, ratio, pad = self.letterbox(mask,new_shape=513, id=idclass,isNeedToConvert=True, auto=False)

        img, ratio, pad = self.letterbox(img, auto=False)


        #     oh = self.crop_size
        #     ow = int(1.0 * w * oh / h)
        # else:
        #     ow = self.crop_size
        #     oh = int(1.0 * h * ow / w)
        # img = img.resize((ow, oh), Image.BILINEAR)
        # mask = mask.resize((ow, oh), Image.NEAREST)
        # # center crop
        # w, h = img.size
        # x1 = int(round((w - self.crop_size) / 2.))
        # y1 = int(round((h - self.crop_size) / 2.))
        # img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
        # mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))

        return {'image': img,
                'label': mask,
                'classid': idclass}

class FixScaleCrop_val(object):
    def __init__(self, crop_size):
        self.crop_size = crop_size
        #self.classed = [(128,0,0), (0,128,0), (128,128,0), (0,0,128),(128,0,128),(0,0,0)]
    def letterbox(self, img, new_shape=(513, 513), color=(0, 0, 0),id =0, isNeedToConvert = False,auto=True, scaleFill=False, scaleup=True):
        # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
        # w, h = img.size

        img = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)

        # shape = (w,h)
        shape = img.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        # Scale ratio (new / old)
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        if not scaleup:  # only scale down, do not scale up (for better test mAP)
            r = min(r, 1.0)

        # Compute padding
        ratio = r, r  # width, height ratios
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
        if auto:  # minimum rectangle
            dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding
        elif scaleFill:  # stretch
            dw, dh = 0.0, 0.0
            new_unpad = (new_shape[1], new_shape[0])
            ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

        dw /= 2  # divide padding into 2 sides
        dh /= 2
        #
        if shape != new_unpad:  # resize
            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))

        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
        # print(img.shape)
        if isNeedToConvert:
            new_img = Image.new('P', new_shape, (0, 0, 0))
            for h in range(0, new_shape[0]):
                for j in range(0, new_shape[1]):
                    (b, g, r) = img[h, j]
                    if (b, g, r) == (1, 1, 1):
                        new_img.putpixel((j, h), self.classed[id])
                        #img[h, j] = self.classed[id]


            return new_img, ratio, (dw, dh)
        else:
            image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), mode='RGB')


        return image, ratio, (dw, dh)



    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']

        mask, ratio, pad = self.letterbox(mask,new_shape=513, id=idclass,isNeedToConvert=True, auto=False)

        img, ratio, pad = self.letterbox(img, auto=False)


        #     oh = self.crop_size
        #     ow = int(1.0 * w * oh / h)
        # else:
        #     ow = self.crop_size
        #     oh = int(1.0 * h * ow / w)
        # img = img.resize((ow, oh), Image.BILINEAR)
        # mask = mask.resize((ow, oh), Image.NEAREST)
        # # center crop
        # w, h = img.size
        # x1 = int(round((w - self.crop_size) / 2.))
        # y1 = int(round((h - self.crop_size) / 2.))
        # img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
        # mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))

        return {'image': img,
                'label': mask}



class FixedResize(object):
    def __init__(self, size):
        self.size = (size, size)  # size: (h, w)

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        assert img.size == mask.size

        img = img.resize(self.size, Image.BILINEAR)
        mask = mask.resize(self.size, Image.NEAREST)

        return {'image': img,
                'label': mask,
                'classid': idclass}

mydata.py

from __future__ import print_function, division
import os
from PIL import Image
import numpy as np
from torch.utils.data import Dataset
from mypath import Path
from torchvision import transforms
from dataloaders import custom_transforms_myda as tr

class VOCSegmentation(Dataset):
    """
    PascalVoc dataset
    """
    NUM_CLASSES = 5+1

    def __init__(self,
                 args,
                 base_dir=Path.db_root_dir('mydata'),
                 split='train',
                 ):
        """
        :param base_dir: path to VOC dataset directory
        :param split: train/val
        :param transform: transform to apply
        """
        super().__init__()
        self._base_dir = base_dir
        self._image_dir = os.path.join(self._base_dir, 'JPEGImages')
        self._cat_dir = os.path.join(self._base_dir, 'SegmentationClass')

        if isinstance(split, str):
            self.split = [split]
        else:
            split.sort()
            self.split = split

        self.args = args

        _splits_dir = os.path.join(self._base_dir, 'ImageSets')

        self.im_ids = []
        self.images = []
        self.categories = []

        for splt in self.split:
            with open(os.path.join(os.path.join(_splits_dir, splt + '.txt')), "r") as f:
                lines = f.read().splitlines()

            for ii, line in enumerate(lines):
                _image = os.path.join(self._image_dir, line)
                fpath,fname = os.path.split(_image)
                fnewname = fname.replace('.jpg','.png')
                _cat = os.path.join(self._cat_dir, fnewname)
                #print(fnewname)
                assert os.path.isfile(_image)
                assert os.path.isfile(_cat)
                self.im_ids.append(line)
                self.images.append(_image)
                self.categories.append(_cat)

        assert (len(self.images) == len(self.categories))

        # Display stats
        print('Number of images in {}: {:d}'.format(split, len(self.images)))

    def __len__(self):
        return len(self.images)


    def __getitem__(self, index):
        _img, _target, _clasid = self._make_img_gt_point_pair(index)
        sample = {'image': _img, 'label': _target, 'classid': _clasid}

        for split in self.split:
            if split == "train":
                # return sample
                return self.transform_tr(sample)
            elif split == 'val':
                return self.transform_val(sample)


    def _make_img_gt_point_pair(self, index):
        _img = Image.open(self.images[index]).convert('RGB')
        #_img = Image.open(a).convert('RGB')
       # print(self.categories[index])
        #_target = Image.open(b)
        _target = Image.open(self.categories[index])
        _,fname = os.path.split(self.categories[index])
        clasid = int(fname.split('-')[0])


        return _img, _target, clasid

    def transform_tr(self, sample):
        composed_transforms = transforms.Compose([
            #tr.RandomHorizontalFlip(),
            tr.FixScaleCrop(crop_size=self.args.crop_size),
            tr.RandomGaussianBlur(),
            tr.Normalize(mean=(0.24127093, 0.2287277,  0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
            tr.ToTensor()])

        return composed_transforms(sample)

    def transform_val(self, sample):

        composed_transforms = transforms.Compose([
            tr.FixScaleCrop(crop_size=self.args.crop_size),
            # tr.Normalize(),
            #tr.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
            tr.Normalize(mean=(0.24127093, 0.2287277, 0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
            tr.ToTensor()])

        return composed_transforms(sample)

    def __str__(self):
        return 'mydata(split=' + str(self.split) + ')'


if __name__ == '__main__':
    from dataloaders.utils import decode_segmap_mydata
    from torch.utils.data import DataLoader
    import matplotlib.pyplot as plt
    import argparse


    def getNonRepeatList2(data):
        new_data = []
        data = data.flatten()
        for i in range(len(data)):
            if data[i] not in new_data:
                new_data.append(data[i])
        return new_data

    def getNonRepeatList3( data):
        return [i for n, i in enumerate(data) if i not in data[:n]]
    parser = argparse.ArgumentParser()
    args = parser.parse_args()
    args.base_size = 513
    args.crop_size = 513

    voc_train = VOCSegmentation(args, split='train')

    dataloader = DataLoader(voc_train, batch_size=5, shuffle=True, num_workers=0,drop_last=True)

    for ii, sample in enumerate(dataloader):
        for jj in range(sample["image"].size()[0]):
            img = sample['image'].numpy()
            gt = sample['label'].numpy()
            #posdd = getNonRepeatList3(ttt)
            #aset =list( set(gt.tolist()))
            classid = sample['classid'].numpy()

            temp = np.array(gt[jj])
            temp_max = np.max(temp)

            temp_t = np.array(gt[jj]).astype(np.uint8)
            temp_t_max = np.max(temp_t)
            tmp = np.array(gt[jj]).astype(np.uint8)

            segmap = decode_segmap_mydata(tmp, classid[jj], dataset='mydata')
            img_tmp = np.transpose(img[jj], axes=[1, 2, 0])
            img_tmp *= (0.07584574, 0.05697405, 0.07654408)
            img_tmp += (0.24127093, 0.2287277,  0.24580745)
            img_tmp *= 255.0
            img_tmp = img_tmp.astype(np.uint8)
            plt.figure()
            plt.title('display')
            plt.subplot(211)
            plt.imshow(img_tmp)
            plt.subplot(212)
            plt.imshow(segmap)

        if ii == 1:
            break

    plt.show(block=True)

命令行参数：

–backbone
resnet
–dataset
mydata

train.py

   def training(self, epoch):
        train_loss = 0.0
        self.model.train()
        tbar = tqdm(self.train_loader)
        num_img_tr = len(self.train_loader)
        for i, sample in enumerate(tbar):
            # mydata
            image, target, id = sample['image'], sample['label'], sample['classid']  # 这里是为了适应我的数据集改的
            #image, target = sample['image'], sample['label'] # 以前的地方
            if self.args.cuda:
                image, target = image.cuda(), target.cuda()
            self.scheduler(self.optimizer, i, epoch, self.best_pred)
            self.optimizer.zero_grad()
            output = self.model(image)
            loss = self.criterion(output, target)
            loss.backward()
            self.optimizer.step()
            train_loss += loss.item()
            tbar.set_description('Train loss: %.3f' % (train_loss / (i + 1)))
            self.writer.add_scalar('train / total_loss_iter', loss.item(), i + num_img_tr * epoch)

            # Show 10 * 3 inference results each epoch
            if i % (num_img_tr // 10) == 0:
                global_step = i + num_img_tr * epoch
                #self.summary.visualize_image(self.writer, self.args.dataset, image, target, output, global_step)# 这里注释了 tensorboard 因为要家在id信息，改的挺多的。不想该了，就不看了
				
        self.writer.add_scalar('train/total_loss_epoch', train_loss, epoch)
        print('[Epoch: %d, numImages: %5d]' % (epoch, i * self.args.batch_size + image.data.shape[0]))
        print('Loss: %.3f' % train_loss)

        if self.args.no_val:
            # save checkpoint every epoch
            is_best = False
            self.saver.save_checkpoint({
                'epoch': epoch + 1,
                'state_dict': self.model.module.state_dict(),
                'optimizer': self.optimizer.state_dict(),
                'best_pred': self.best_pred,
            }, is_best)

训练完之后的运行代码

demotest.py

import argparse
import os
import numpy as np
import time

from modeling.deeplab import *
from dataloaders import custom_transforms_myda as tr
from PIL import Image
from torchvision import transforms
from dataloaders.utils import *
from torchvision.utils import make_grid,save_image


def main():
    argparser = argparse.ArgumentParser(description='Pytorch DeeplabV3Plus Training')
    argparser.add_argument('--in_path', type=str, required=True, help='image to set')
    argparser.add_argument('--out_path', type=str, required=True, help='image to save result')
    argparser.add_argument('--backbone', type=str, default='resnet', choices=['resnet','xception','drn','mobilenet'], help='backbone name (deafult:resnet)')
    argparser.add_argument('--ckpt', type=str, default='deeplab-resnet.pth', help='saved model')
    argparser.add_argument('--out_stride', type=int, default=16, help='network output stride (deafult: 8)')
    argparser.add_argument('--no_cuda',  action='store_true', default=False,
                           help='disables CUDA training')
    argparser.add_argument('--gpu_ids', type=str, default='0',
                           help='use which gpu to train,must be a comma-separated list of intergers only(deafult=0)')
    argparser.add_argument('--dataset', type=str, default='mydata',choices=['pascal','coco','cityscaoes','mydata'],
                           help='dataset name (deafult=coco)')
    argparser.add_argument('--crop_size', type=int, default=513,
                           help='crop image size (deafult=513)')
    argparser.add_argument('--num_classes', type=int, default=21,
                        help='how many classes to sep')
    argparser.add_argument('--sync_bn', type=bool, default=None,
                           help='whether to use sync bn (deafult=auto)')

    args = argparser.parse_args()
    print(args)
    a = torch.cuda.is_available()
    b = not args.no_cuda
    args.cuda = not args.no_cuda and torch.cuda.is_available()


    if args.cuda:
        try:
            args.gpu_ids = [int(s) for s in args.gpu_ids.split(',')]
        except ValueError:
            raise ValueError('Argument --gpu_ids must be a comma-separated list of integers only')
    if args.sync_bn is None:
        if args.cuda and len(args.gpu_ids) > 1:
            args.sync_bn = True
        else:
            args.sync_bn = False
    model_s_time = time.time()

    model = DeepLab(num_classes=args.num_classes,
                    backbone=args.backbone,
                    output_stride=args.out_stride,
                    sync_bn=args.sync_bn,
                    )

    ckpt = torch.load(args.ckpt, map_location='cpu')
    model.load_state_dict(ckpt['state_dict'])
    model = model.cuda()
    model_u_time = time.time()
    model_load_time = model_u_time - model_s_time
    print('[INFO] model load time is {}'.format(model_load_time))

    composed_transforms = transforms.Compose([
        # tr.RandomHorizontalFlip(),
        tr.FixScaleCrop(crop_size=513),
        tr.RandomGaussianBlur(),
        tr.Normalize(mean=(0.24127093, 0.2287277, 0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
        tr.ToTensor()])

    for name in os.listdir(args.in_path):
        s_time = time.time()
        image = Image.open(args.in_path + '/' + name).convert('RGB')

        # image =
        target = Image.open(args.in_path + '/' + name).convert('P')

        clasid = int(name.split('-')[0])
        sample = {'image': image, 'label': target, 'classid': clasid}
        # sample = {'image': image,'label': target}
        tensor_in = composed_transforms(sample)['image'].unsqueeze(0)
        tensor_in.to()
        model.eval()
        if args.cuda:
            tensor_in = tensor_in.cuda()
        with torch.no_grad():
            output = model(tensor_in)

        grid_image = make_grid(decode_seg_map_sequence(torch.max(output[:3], 1)[1].detach().cpu().numpy()))
        save_image(grid_image, args.in_path + '/' + '{}_mask.png'.format(name[0:-4]))
        u_time = time.time()
        img_time = u_time - s_time
        print('image:{} time:{}'.format(name, img_time))

    print('image save in ' + args.in_path)

# def make_data(args, **kwargs):
#     if args.dataset == 'coco':
#         train_set = coco.COCOSegmentation(args, split='train')
#         val_set = coco.COCOSegmentation(args, split='val')
#         num_class = train_set.NUM_CLASSES
#         train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, **kwargs)
#         val_loader = DataLoader(val_set, batch_size=args.batch_size, shuffle=False, **kwargs)
#         test_loader = None
#         return train_loader, val_loader, test_loader, num_class

if __name__ == '__main__':
    main()