关于deeplabv3+ 使用自定义数据集训练出现问题2

关于deeplabv3+ 使用自定义数据集训练出现问题2

前言

见 : 原创 关于deeplabv3+ 使用自定义数据集训练出现的问题1

1 新出现的问题

1.1 问题简述

​ 首先由于我使用的数据处理是从dataloaders/datasets/pascal.py 复制过来的,所以很多代码都是从这里变化过来的。正因为如此,而我们的数据集瓷瓦裂痕的数据集,MASK 图片颜色就两种,背景是RGB:(0,0,0),识别项是RGB(255,255,255)

​ 从voc 标注的图片,它是用PIL.Imgae 下的类打开图片,保存图片的模式’P’,该模式自行必应。简单来说,每个颜色对应一个色盘

例如:(0, 0, 0)-----1

​ (255,255,255)-----2

每个颜色对应的一个调色盘。

1.2 VOC 的数据集中对应调色盘

​ VOC 的数据集中对应调色盘大概是如此的

	array([[  0,  0,  0],
 [128,  0,  0],
 [  0, 128,  0],
 [128, 128,  0],
 [  0,  0, 128],
 [128,  0, 128],
 [  0, 128, 128],
 [128, 128, 128],
 [ 64,  0,  0],
 [192,  0,  0],
 [ 64, 128,  0],
 [192, 128,  0],
 [ 64,  0, 128],
 [192,  0, 128],
 [ 64, 128, 128],
 [192, 128, 128],
 [  0,  64,  0],
 [128,  64,  0],
 [  0, 192,  0],
 [128, 192,  0],
 [  0,  64, 128],
 [128,  64, 128],
 [  0, 192, 128],
 [128, 192, 128],
 [ 64,  64,  0],
 [192,  64,  0],
 [ 64, 192,  0],
 [192, 192,  0],
 [ 64,  64, 128],
 [192,  64, 128],
 [ 64, 192, 128],
 [192, 192, 128],
 [  0,  0,  64],
 [128,  0,  64],
 [  0, 128,  64],
 [128, 128,  64],
 [  0,  0, 192],
 [128,  0, 192],
 [  0, 128, 192],
 [128, 128, 192],
 [ 64,  0,  64],
 [192,  0,  64],
 [ 64, 128,  64],
 [192, 128,  64],
 [ 64,  0, 192],
 [192,  0, 192],
 [ 64, 128, 192],
 [192, 128, 192],
 [  0,  64,  64],
 [128,  64,  64],
 [  0, 192,  64],
 [128, 192,  64],
 [  0,  64, 192],
 [128,  64, 192],
 [  0, 192, 192],
 [128, 192, 192],
 [ 64,  64,  64],
 [192,  64,  64],
 [ 64, 192,  64],
 [192, 192,  64],
 [ 64,  64, 192],
 [192,  64, 192],
 [ 64, 192, 192],
 [192, 192, 192],
 [ 32,  0,  0],
 [160,  0,  0],
 [ 32, 128,  0],
 [160, 128,  0],
 [ 32,  0, 128],
 [160,  0, 128],
 [ 32, 128, 128],
 [160, 128, 128],
 [ 96,  0,  0],
 [224,  0,  0],
 [ 96, 128,  0],
 [224, 128,  0],
 [ 96,  0, 128],
 [224,  0, 128],
 [ 96, 128, 128],
 [224, 128, 128],
 [ 32,  64,  0],
 [160,  64,  0],
 [ 32, 192,  0],
 [160, 192,  0],
 [ 32,  64, 128],
 [160,  64, 128],
 [ 32, 192, 128],
 [160, 192, 128],
 [ 96,  64,  0],
 [224,  64,  0],
 [ 96, 192,  0],
 [224, 192,  0],
 [ 96,  64, 128],
 [224,  64, 128],
 [ 96, 192, 128],
 [224, 192, 128],
 [ 32,  0,  64],
 [160,  0,  64],
 [ 32, 128,  64],
 [160, 128,  64],
 [ 32,  0, 192],
 [160,  0, 192],
 [ 32, 128, 192],
 [160, 128, 192],
 [ 96,  0,  64],
 [224,  0,  64],
 [ 96, 128,  64],
 [224, 128,  64],
 [ 96,  0, 192],
 [224,  0, 192],
 [ 96, 128, 192],
 [224, 128, 192],
 [ 32,  64,  64],
 [160,  64,  64],
 [ 32, 192,  64],
 [160, 192,  64],
 [ 32,  64, 192],
 [160,  64, 192],
 [ 32, 192, 192],
 [160, 192, 192],
 [ 96,  64,  64],
 [224,  64,  64],
 [ 96, 192,  64],
 [224, 192,  64],
 [ 96,  64, 192],
 [224,  64, 192],
 [ 96, 192, 192],
 [224, 192, 192],
 [  0,  32,  0],
 [128,  32,  0],
 [  0, 160,  0],
 [128, 160,  0],
 [  0,  32, 128],
 [128,  32, 128],
 [  0, 160, 128],
 [128, 160, 128],
 [ 64,  32,  0],
 [192,  32,  0],
 [ 64, 160,  0],
 [192, 160,  0],
 [ 64,  32, 128],
 [192,  32, 128],
 [ 64, 160, 128],
 [192, 160, 128],
 [  0,  96,  0],
 [128,  96,  0],
 [  0, 224,  0],
 [128, 224,  0],
 [  0,  96, 128],
 [128,  96, 128],
 [  0, 224, 128],
 [128, 224, 128],
 [ 64,  96,  0],
 [192,  96,  0],
 [ 64, 224,  0],
 [192, 224,  0],
 [ 64,  96, 128],
 [192,  96, 128],
 [ 64, 224, 128],
 [192, 224, 128],
 [  0,  32,  64],
 [128,  32,  64],
 [  0, 160,  64],
 [128, 160,  64],
 [  0,  32, 192],
 [128,  32, 192],
 [  0, 160, 192],
 [128, 160, 192],
 [ 64,  32,  64],
 [192,  32,  64],
 [ 64, 160,  64],
 [192, 160,  64],
 [ 64,  32, 192],
 [192,  32, 192],
 [ 64, 160, 192],
 [192, 160, 192],
 [  0,  96,  64],
 [128,  96,  64],
 [  0, 224,  64],
 [128, 224,  64],
 [  0,  96, 192],
 [128,  96, 192],
 [  0, 224, 192],
 [128, 224, 192],
 [ 64,  96,  64],
 [192,  96,  64],
 [ 64, 224,  64],
 [192, 224,  64],
 [ 64,  96, 192],
 [192,  96, 192],
 [ 64, 224, 192],
 [192, 224, 192],
 [ 32,  32,  0],
 [160,  32,  0],
 [ 32, 160,  0],
 [160, 160,  0],
 [ 32,  32, 128],
 [160,  32, 128],
 [ 32, 160, 128],
 [160, 160, 128],
 [ 96,  32,  0],
 [224,  32,  0],
 [ 96, 160,  0],
 [224, 160,  0],
 [ 96,  32, 128],
 [224,  32, 128],
 [ 96, 160, 128],
 [224, 160, 128],
 [ 32,  96,  0],
 [160,  96,  0],
 [ 32, 224,  0],
 [160, 224,  0],
 [ 32,  96, 128],
 [160,  96, 128],
 [ 32, 224, 128],
 [160, 224, 128],
 [ 96,  96,  0],
 [224,  96,  0],
 [ 96, 224,  0],
 [224, 224,  0],
 [ 96,  96, 128],
 [224,  96, 128],
 [ 96, 224, 128],
 [224, 224, 128],
 [ 32,  32,  64],
 [160,  32,  64],
 [ 32, 160,  64],
 [160, 160,  64],
 [ 32,  32, 192],
 [160,  32, 192],
 [ 32, 160, 192],
 [160, 160, 192],
 [ 96,  32,  64],
 [224,  32,  64],
 [ 96, 160,  64],
 [224, 160,  64],
 [ 96,  32, 192],
 [224,  32, 192],
 [ 96, 160, 192],
 [224, 160, 192],
 [ 32,  96,  64],
 [160,  96,  64],
 [ 32, 224,  64],
 [160, 224,  64],
 [ 32,  96, 192],
 [160,  96, 192],
 [ 32, 224, 192],
 [160, 224, 192],
 [ 96,  96,  64],
 [224,  96,  64],
 [ 96, 224,  64],
 [224, 224,  64],
 [ 96,  96, 192],
 [224,  96, 192],
 [ 96, 224, 192],
 [224, 224, 192]], dtype=uint8)

1.3 想要的调色盘

​ 因此,从这里我原本想法是这样,让第一个类对应的是[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-A81SYSMl-1609727041806)(/home/dlh/Pictures/Screenshot from 2021-01-04 09-20-39.png)]

结果就是

[[  0   0   0]      <--- 背景颜色                          ----0
 [128   0   0]      <--- 标注的第一个类,对应调色盘上的1      -----1   
 [  2   2   2]      <--- 标注的第二个类,对应调色盘上的2      -----2    
 [  3   3   3]		<--- 标注的第三个类,对应调色盘上的3      -----3   
 [  4   4   4] 		<--- 标注的第四个类,对应调色盘上的4      -----4   
 [  5   5   5] 		<--- 标注的第五个类,对应调色盘上的5      -----5   
 [  6   6   6] 		
 ...
 ...
 [253 253 253]
 [254 254 254]
 [224 224 192]]		<--- 忽略的颜色					    -----255  

​ 但是无论如何做,似乎如何也做不到我想象的情况!

​ 我做的改动:

  1. 从原图上改动
import numpy as np
from PIL import Image

# 打开图像
im = Image.open('a.png')

# 获取调色盘
palette = np.array(im.getpalette(),dtype=np.uint8).reshape((256,3))

# 将2号对应的颜色深拷贝过来
temp = copy.deepcopy(palette[2])

# 改变调色盘上的 palette[2]Value [224 224 192]  -->  [2 2 2]  
palette[2] = (2,2,2)

#改变调色盘上的[255]Value [255 255 255] -->[224 224 192]
palette[255] = temp

# 
aa = palette.flatten().tolist()

#调色盘上方回去
im.putpalette(aa)

结果就是

[[  0   0   0]      <--- 背景颜色            ----0
 [128   0   0]      <--- 标注的第一个类      -----1
 [  2   2   2]      <--- 这是忽略的颜色      -----2   变成了【2 2 2】 # 那我要如果和让 [224 224 192]  对应成  调色盘上的255呢?目前没有答案
 [  3   3   3]
 [  4   4   4]
 [  5   5   5]
 [  6   6   6]
 ...
 ...
 [253 253 253]
 [254 254 254]
 [224 224 192]]

然后,我只能妥协了:没时间高这么多了,没有了忽略颜色,直接全部是我需要的。并且每个类都是标注的1,如果成为我想的那个数据集 估计要重新用标注软件重新标注,费时腓力。

1.4 最终的调色盘

最终的结果就是

# 这是第一个类的颜色
[[  0   0   0]      <--- 背景颜色            ----0
 [128   0   0]      <--- 标注的类           -----1  # 红色
 [  2   2   2]      
 ...
 ...
 [255 255 255]]
 
 # 这是第二个类的颜色
[[  0   0   0]      <--- 背景颜色            ----0
 [  0  128  0]      <--- 标注的类     		-----1  # 绿色
 [  2   2   2]      
 ...
 ...
 [255 255 255]]

2 代码块的修改

2.1 pascal.py copy->mydata.py

​ 这里有一个地方恶心到我了,因为有时候用图片测试载入,弄者弄着就全黑了,但原图是有mask的标记了,我也不知道为啥。我找了几天,问题很简单,就是在对图片和mask 处理的时候,它有一个随即裁减的功能,然后就是把mask给剪掉了。。。剪掉了。。。裂开…

​ 所以有了一个 custom_transforms_myda.py。改的地方挺多的

class FixedResize(object):
    def __init__(self, size):
        self.size = (size, size)  # size: (h, w)

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        assert img.size == mask.size

        img = img.resize(self.size, Image.BILINEAR)
        mask = mask.resize(self.size, Image.NEAREST)

        return {'image': img,
                'label': mask,
                'classid': idclass}    #@ <-- 每个都需要返回一个id  用来识别1号调色盘  属于那个类

整个 custom_transforms_myda.py 在这里

import torch
import random
import numpy as np

from PIL import Image, ImageOps, ImageFilter
import cv2
import matplotlib.pyplot as plt

class Normalize(object):
    """Normalize a tensor image with mean and standard deviation.
    Args:
        mean (tuple): means for each channel.
        std (tuple): standard deviations for each channel.
    """
    def __init__(self, mean=(0., 0., 0.), std=(1., 1., 1.)):
        self.mean = mean
        self.std = std

    def __call__(self, sample):
        img = sample['image']
        maska = sample['label']
        idclass = sample['classid']
        img = np.array(img).astype(np.float32)
        mask = np.array(maska).astype(np.float32)
        img /= 255.0
        img -= self.mean
        img /= self.std

        return {'image': img,
                'label': mask,
                'classid': idclass}


class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        img = np.array(img).astype(np.float32).transpose((2, 0, 1))
        mask = np.array(mask).astype(np.float32)

        img = torch.from_numpy(img).float()
        mask = torch.from_numpy(mask).float()

        return {'image': img,
                'label': mask,
                'classid': idclass}


class RandomHorizontalFlip(object):
    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']

        idclass = sample['classid']
        if random.random() < 0.5:
            img = img.transpose(Image.FLIP_LEFT_RIGHT)
            mask = mask.transpose(Image.FLIP_LEFT_RIGHT)

        return {'image': img,
                'label': mask,
                'classid': idclass}


class RandomRotate(object):
    def __init__(self, degree):
        self.degree = degree

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']

        rotate_degree = random.uniform(-1*self.degree, self.degree)
        img = img.rotate(rotate_degree, Image.BILINEAR)
        mask = mask.rotate(rotate_degree, Image.NEAREST)


        return {'image': img,
                'label': mask,
             'classid': idclass}

class RandomGaussianBlur(object):
    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        if random.random() < 0.5:
            img = img.filter(ImageFilter.GaussianBlur(
                radius=random.random()))

        return {'image': img,
                'label': mask,
                'classid': idclass}


class RandomScaleCrop(object):
    def __init__(self, base_size, crop_size, fill=0):
        self.base_size = base_size
        self.crop_size = crop_size
        self.fill = fill

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        # random scale (short edge)
        short_size = random.randint(int(self.base_size * 0.5), int(self.base_size * 2.0))
        w, h = img.size
        if h > w:
            ow = short_size
            oh = int(1.0 * h * ow / w)
        else:
            oh = short_size
            ow = int(1.0 * w * oh / h)
        img = img.resize((ow, oh), Image.BILINEAR)
        mask = mask.resize((ow, oh), Image.NEAREST)
        # pad crop
        if short_size < self.crop_size:
            padh = self.crop_size - oh if oh < self.crop_size else 0
            padw = self.crop_size - ow if ow < self.crop_size else 0
            img = ImageOps.expand(img, border=(0, 0, padw, padh), fill=0)
            mask = ImageOps.expand(mask, border=(0, 0, padw, padh), fill=self.fill)
        # random crop crop_size
        w, h = img.size
        x1 = random.randint(0, w - self.crop_size)
        y1 = random.randint(0, h - self.crop_size)
        img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
        mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))

        return {'image': img,
                'label': mask,
                'classid': idclass}


class FixScaleCrop(object):
    def __init__(self, crop_size):
        self.crop_size = crop_size
        self.classed = [(128,0,0), (0,128,0), (128,128,0), (0,0,128),(128,0,128),(0,0,0)]
    def letterbox(self, img, new_shape=(513, 513), color=(0, 0, 0),id =0, isNeedToConvert = False,auto=True, scaleFill=False, scaleup=True):
        # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
        # w, h = img.size

        img = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)

        # shape = (w,h)
        shape = img.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        # Scale ratio (new / old)
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        if not scaleup:  # only scale down, do not scale up (for better test mAP)
            r = min(r, 1.0)

        # Compute padding
        ratio = r, r  # width, height ratios
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
        if auto:  # minimum rectangle
            dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding
        elif scaleFill:  # stretch
            dw, dh = 0.0, 0.0
            new_unpad = (new_shape[1], new_shape[0])
            ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

        dw /= 2  # divide padding into 2 sides
        dh /= 2
        #
        if shape != new_unpad:  # resize
            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))

        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
        # print(img.shape)
        if isNeedToConvert:
            new_img = Image.new('P', new_shape, (0, 0, 0))
            for h in range(0, new_shape[0]):
                for j in range(0, new_shape[1]):
                    (b, g, r) = img[h, j]
                    if (b, g, r) == (1, 1, 1):
                        new_img.putpixel((j, h), self.classed[id])
                        #img[h, j] = self.classed[id]


            return new_img, ratio, (dw, dh)
        else:
            image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), mode='RGB')


        return image, ratio, (dw, dh)



    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']

        mask, ratio, pad = self.letterbox(mask,new_shape=513, id=idclass,isNeedToConvert=True, auto=False)

        img, ratio, pad = self.letterbox(img, auto=False)


        #     oh = self.crop_size
        #     ow = int(1.0 * w * oh / h)
        # else:
        #     ow = self.crop_size
        #     oh = int(1.0 * h * ow / w)
        # img = img.resize((ow, oh), Image.BILINEAR)
        # mask = mask.resize((ow, oh), Image.NEAREST)
        # # center crop
        # w, h = img.size
        # x1 = int(round((w - self.crop_size) / 2.))
        # y1 = int(round((h - self.crop_size) / 2.))
        # img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
        # mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))

        return {'image': img,
                'label': mask,
                'classid': idclass}

class FixScaleCrop_val(object):
    def __init__(self, crop_size):
        self.crop_size = crop_size
        #self.classed = [(128,0,0), (0,128,0), (128,128,0), (0,0,128),(128,0,128),(0,0,0)]
    def letterbox(self, img, new_shape=(513, 513), color=(0, 0, 0),id =0, isNeedToConvert = False,auto=True, scaleFill=False, scaleup=True):
        # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
        # w, h = img.size

        img = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)

        # shape = (w,h)
        shape = img.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        # Scale ratio (new / old)
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        if not scaleup:  # only scale down, do not scale up (for better test mAP)
            r = min(r, 1.0)

        # Compute padding
        ratio = r, r  # width, height ratios
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
        if auto:  # minimum rectangle
            dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding
        elif scaleFill:  # stretch
            dw, dh = 0.0, 0.0
            new_unpad = (new_shape[1], new_shape[0])
            ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

        dw /= 2  # divide padding into 2 sides
        dh /= 2
        #
        if shape != new_unpad:  # resize
            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))

        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
        # print(img.shape)
        if isNeedToConvert:
            new_img = Image.new('P', new_shape, (0, 0, 0))
            for h in range(0, new_shape[0]):
                for j in range(0, new_shape[1]):
                    (b, g, r) = img[h, j]
                    if (b, g, r) == (1, 1, 1):
                        new_img.putpixel((j, h), self.classed[id])
                        #img[h, j] = self.classed[id]


            return new_img, ratio, (dw, dh)
        else:
            image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), mode='RGB')


        return image, ratio, (dw, dh)



    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']

        mask, ratio, pad = self.letterbox(mask,new_shape=513, id=idclass,isNeedToConvert=True, auto=False)

        img, ratio, pad = self.letterbox(img, auto=False)


        #     oh = self.crop_size
        #     ow = int(1.0 * w * oh / h)
        # else:
        #     ow = self.crop_size
        #     oh = int(1.0 * h * ow / w)
        # img = img.resize((ow, oh), Image.BILINEAR)
        # mask = mask.resize((ow, oh), Image.NEAREST)
        # # center crop
        # w, h = img.size
        # x1 = int(round((w - self.crop_size) / 2.))
        # y1 = int(round((h - self.crop_size) / 2.))
        # img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
        # mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))

        return {'image': img,
                'label': mask}



class FixedResize(object):
    def __init__(self, size):
        self.size = (size, size)  # size: (h, w)

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        idclass = sample['classid']
        assert img.size == mask.size

        img = img.resize(self.size, Image.BILINEAR)
        mask = mask.resize(self.size, Image.NEAREST)

        return {'image': img,
                'label': mask,
                'classid': idclass}

mydata.py

from __future__ import print_function, division
import os
from PIL import Image
import numpy as np
from torch.utils.data import Dataset
from mypath import Path
from torchvision import transforms
from dataloaders import custom_transforms_myda as tr

class VOCSegmentation(Dataset):
    """
    PascalVoc dataset
    """
    NUM_CLASSES = 5+1

    def __init__(self,
                 args,
                 base_dir=Path.db_root_dir('mydata'),
                 split='train',
                 ):
        """
        :param base_dir: path to VOC dataset directory
        :param split: train/val
        :param transform: transform to apply
        """
        super().__init__()
        self._base_dir = base_dir
        self._image_dir = os.path.join(self._base_dir, 'JPEGImages')
        self._cat_dir = os.path.join(self._base_dir, 'SegmentationClass')

        if isinstance(split, str):
            self.split = [split]
        else:
            split.sort()
            self.split = split

        self.args = args

        _splits_dir = os.path.join(self._base_dir, 'ImageSets')

        self.im_ids = []
        self.images = []
        self.categories = []

        for splt in self.split:
            with open(os.path.join(os.path.join(_splits_dir, splt + '.txt')), "r") as f:
                lines = f.read().splitlines()

            for ii, line in enumerate(lines):
                _image = os.path.join(self._image_dir, line)
                fpath,fname = os.path.split(_image)
                fnewname = fname.replace('.jpg','.png')
                _cat = os.path.join(self._cat_dir, fnewname)
                #print(fnewname)
                assert os.path.isfile(_image)
                assert os.path.isfile(_cat)
                self.im_ids.append(line)
                self.images.append(_image)
                self.categories.append(_cat)

        assert (len(self.images) == len(self.categories))

        # Display stats
        print('Number of images in {}: {:d}'.format(split, len(self.images)))

    def __len__(self):
        return len(self.images)


    def __getitem__(self, index):
        _img, _target, _clasid = self._make_img_gt_point_pair(index)
        sample = {'image': _img, 'label': _target, 'classid': _clasid}

        for split in self.split:
            if split == "train":
                # return sample
                return self.transform_tr(sample)
            elif split == 'val':
                return self.transform_val(sample)


    def _make_img_gt_point_pair(self, index):
        _img = Image.open(self.images[index]).convert('RGB')
        #_img = Image.open(a).convert('RGB')
       # print(self.categories[index])
        #_target = Image.open(b)
        _target = Image.open(self.categories[index])
        _,fname = os.path.split(self.categories[index])
        clasid = int(fname.split('-')[0])


        return _img, _target, clasid

    def transform_tr(self, sample):
        composed_transforms = transforms.Compose([
            #tr.RandomHorizontalFlip(),
            tr.FixScaleCrop(crop_size=self.args.crop_size),
            tr.RandomGaussianBlur(),
            tr.Normalize(mean=(0.24127093, 0.2287277,  0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
            tr.ToTensor()])

        return composed_transforms(sample)

    def transform_val(self, sample):

        composed_transforms = transforms.Compose([
            tr.FixScaleCrop(crop_size=self.args.crop_size),
            # tr.Normalize(),
            #tr.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
            tr.Normalize(mean=(0.24127093, 0.2287277, 0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
            tr.ToTensor()])

        return composed_transforms(sample)

    def __str__(self):
        return 'mydata(split=' + str(self.split) + ')'


if __name__ == '__main__':
    from dataloaders.utils import decode_segmap_mydata
    from torch.utils.data import DataLoader
    import matplotlib.pyplot as plt
    import argparse


    def getNonRepeatList2(data):
        new_data = []
        data = data.flatten()
        for i in range(len(data)):
            if data[i] not in new_data:
                new_data.append(data[i])
        return new_data

    def getNonRepeatList3( data):
        return [i for n, i in enumerate(data) if i not in data[:n]]
    parser = argparse.ArgumentParser()
    args = parser.parse_args()
    args.base_size = 513
    args.crop_size = 513

    voc_train = VOCSegmentation(args, split='train')

    dataloader = DataLoader(voc_train, batch_size=5, shuffle=True, num_workers=0,drop_last=True)

    for ii, sample in enumerate(dataloader):
        for jj in range(sample["image"].size()[0]):
            img = sample['image'].numpy()
            gt = sample['label'].numpy()
            #posdd = getNonRepeatList3(ttt)
            #aset =list( set(gt.tolist()))
            classid = sample['classid'].numpy()

            temp = np.array(gt[jj])
            temp_max = np.max(temp)

            temp_t = np.array(gt[jj]).astype(np.uint8)
            temp_t_max = np.max(temp_t)
            tmp = np.array(gt[jj]).astype(np.uint8)

            segmap = decode_segmap_mydata(tmp, classid[jj], dataset='mydata')
            img_tmp = np.transpose(img[jj], axes=[1, 2, 0])
            img_tmp *= (0.07584574, 0.05697405, 0.07654408)
            img_tmp += (0.24127093, 0.2287277,  0.24580745)
            img_tmp *= 255.0
            img_tmp = img_tmp.astype(np.uint8)
            plt.figure()
            plt.title('display')
            plt.subplot(211)
            plt.imshow(img_tmp)
            plt.subplot(212)
            plt.imshow(segmap)

        if ii == 1:
            break

    plt.show(block=True)



命令行参数:

–backbone
resnet
–dataset
mydata

train.py

   def training(self, epoch):
        train_loss = 0.0
        self.model.train()
        tbar = tqdm(self.train_loader)
        num_img_tr = len(self.train_loader)
        for i, sample in enumerate(tbar):
            # mydata
            image, target, id = sample['image'], sample['label'], sample['classid']  # 这里是为了适应我的数据集改的
            #image, target = sample['image'], sample['label'] # 以前的地方
            if self.args.cuda:
                image, target = image.cuda(), target.cuda()
            self.scheduler(self.optimizer, i, epoch, self.best_pred)
            self.optimizer.zero_grad()
            output = self.model(image)
            loss = self.criterion(output, target)
            loss.backward()
            self.optimizer.step()
            train_loss += loss.item()
            tbar.set_description('Train loss: %.3f' % (train_loss / (i + 1)))
            self.writer.add_scalar('train / total_loss_iter', loss.item(), i + num_img_tr * epoch)

            # Show 10 * 3 inference results each epoch
            if i % (num_img_tr // 10) == 0:
                global_step = i + num_img_tr * epoch
                #self.summary.visualize_image(self.writer, self.args.dataset, image, target, output, global_step)# 这里注释了 tensorboard 因为要家在id信息,改的挺多的。不想该了,就不看了
				
        self.writer.add_scalar('train/total_loss_epoch', train_loss, epoch)
        print('[Epoch: %d, numImages: %5d]' % (epoch, i * self.args.batch_size + image.data.shape[0]))
        print('Loss: %.3f' % train_loss)

        if self.args.no_val:
            # save checkpoint every epoch
            is_best = False
            self.saver.save_checkpoint({
                'epoch': epoch + 1,
                'state_dict': self.model.module.state_dict(),
                'optimizer': self.optimizer.state_dict(),
                'best_pred': self.best_pred,
            }, is_best)

训练完之后的运行代码

demotest.py

import argparse
import os
import numpy as np
import time

from modeling.deeplab import *
from dataloaders import custom_transforms_myda as tr
from PIL import Image
from torchvision import transforms
from dataloaders.utils import *
from torchvision.utils import make_grid,save_image


def main():
    argparser = argparse.ArgumentParser(description='Pytorch DeeplabV3Plus Training')
    argparser.add_argument('--in_path', type=str, required=True, help='image to set')
    argparser.add_argument('--out_path', type=str, required=True, help='image to save result')
    argparser.add_argument('--backbone', type=str, default='resnet', choices=['resnet','xception','drn','mobilenet'], help='backbone name (deafult:resnet)')
    argparser.add_argument('--ckpt', type=str, default='deeplab-resnet.pth', help='saved model')
    argparser.add_argument('--out_stride', type=int, default=16, help='network output stride (deafult: 8)')
    argparser.add_argument('--no_cuda',  action='store_true', default=False,
                           help='disables CUDA training')
    argparser.add_argument('--gpu_ids', type=str, default='0',
                           help='use which gpu to train,must be a comma-separated list of intergers only(deafult=0)')
    argparser.add_argument('--dataset', type=str, default='mydata',choices=['pascal','coco','cityscaoes','mydata'],
                           help='dataset name (deafult=coco)')
    argparser.add_argument('--crop_size', type=int, default=513,
                           help='crop image size (deafult=513)')
    argparser.add_argument('--num_classes', type=int, default=21,
                        help='how many classes to sep')
    argparser.add_argument('--sync_bn', type=bool, default=None,
                           help='whether to use sync bn (deafult=auto)')

    args = argparser.parse_args()
    print(args)
    a = torch.cuda.is_available()
    b = not args.no_cuda
    args.cuda = not args.no_cuda and torch.cuda.is_available()


    if args.cuda:
        try:
            args.gpu_ids = [int(s) for s in args.gpu_ids.split(',')]
        except ValueError:
            raise ValueError('Argument --gpu_ids must be a comma-separated list of integers only')
    if args.sync_bn is None:
        if args.cuda and len(args.gpu_ids) > 1:
            args.sync_bn = True
        else:
            args.sync_bn = False
    model_s_time = time.time()

    model = DeepLab(num_classes=args.num_classes,
                    backbone=args.backbone,
                    output_stride=args.out_stride,
                    sync_bn=args.sync_bn,
                    )

    ckpt = torch.load(args.ckpt, map_location='cpu')
    model.load_state_dict(ckpt['state_dict'])
    model = model.cuda()
    model_u_time = time.time()
    model_load_time = model_u_time - model_s_time
    print('[INFO] model load time is {}'.format(model_load_time))

    composed_transforms = transforms.Compose([
        # tr.RandomHorizontalFlip(),
        tr.FixScaleCrop(crop_size=513),
        tr.RandomGaussianBlur(),
        tr.Normalize(mean=(0.24127093, 0.2287277, 0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
        tr.ToTensor()])

    for name in os.listdir(args.in_path):
        s_time = time.time()
        image = Image.open(args.in_path + '/' + name).convert('RGB')

        # image =
        target = Image.open(args.in_path + '/' + name).convert('P')

        clasid = int(name.split('-')[0])
        sample = {'image': image, 'label': target, 'classid': clasid}
        # sample = {'image': image,'label': target}
        tensor_in = composed_transforms(sample)['image'].unsqueeze(0)
        tensor_in.to()
        model.eval()
        if args.cuda:
            tensor_in = tensor_in.cuda()
        with torch.no_grad():
            output = model(tensor_in)

        grid_image = make_grid(decode_seg_map_sequence(torch.max(output[:3], 1)[1].detach().cpu().numpy()))
        save_image(grid_image, args.in_path + '/' + '{}_mask.png'.format(name[0:-4]))
        u_time = time.time()
        img_time = u_time - s_time
        print('image:{} time:{}'.format(name, img_time))

    print('image save in ' + args.in_path)

# def make_data(args, **kwargs):
#     if args.dataset == 'coco':
#         train_set = coco.COCOSegmentation(args, split='train')
#         val_set = coco.COCOSegmentation(args, split='val')
#         num_class = train_set.NUM_CLASSES
#         train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, **kwargs)
#         val_loader = DataLoader(val_set, batch_size=args.batch_size, shuffle=False, **kwargs)
#         test_loader = None
#         return train_loader, val_loader, test_loader, num_class

if __name__ == '__main__':
    main()


运行结果

下图是原图
在这里插入图片描述下图是运行后预测出来的图
在这里插入图片描述下图是原本标注的图
在这里插入图片描述

3 总结

​ 总体结果结果看起来还是不错的。给我的教训就是要按照官方的标注数据格式来,会更快的完成所需要的项目。

### 使用DeepLabV3+训练自定义数据集 #### 准备环境和安装依赖库 为了使用DeepLabV3+模型进行图像分割,首先需要准备开发环境并安装必要的软件包。对于TensorFlow框架下的操作如下: ```bash pip install tensorflow keras-cv ``` 对于PyTorch框架,则需执行不同的命令来设置相应的运行环境。 #### 数据预处理 在开始之前,确保已经准备好标注好的图片以及对应的标签文件。通常情况下,这些资料会被整理成特定格式的数据集,比如Pascal VOC或COCO格式。针对自定义数据集,可能还需要编写脚本来转换原始数据到上述标准格式之一[^1]。 #### 加载预训练模型 可以利用KerasCV提供的接口轻松加载预先训练过的DeepLabV3+实例作为起点。这一步骤能够显著减少收敛时间,并提高最终性能表现。 ```python import keras_cv.models.segmentation.deeplab_v3_plus as deeplab model = deeplab.DeepLabV3Plus.from_preset( "deeplabv3plus_xception_coco", # 预设名称取决于所选的基础网络与权重源 num_classes=YOUR_DATASET_NUM_CLASSES # 替换为实际类别数量 ) ``` #### 自定义配置调整 根据具体应用场景的需求,可进一步修改超参数设定、优化器选择等方面的内容以适应不同任务特点。例如更改损失函数类型或是引入额外正则化机制等措施有助于提升泛化能力。 #### 训练过程管理 通过定义合适的回调(callbacks),可以在整个迭代过程中监控进度变化情况;同时保存最佳版本的权值组合以便后续部署应用阶段调用。下面给出了一段简单的代码片段展示如何完成这一目标: ```python from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping callbacks_list = [ ModelCheckpoint(filepath=&#39;best_model.h5&#39;, save_best_only=True), EarlyStopping(monitor=&#39;val_loss&#39;, patience=5) ] history = model.fit(training_dataset, validation_data=validation_dataset, epochs=EPOCHS_NUMBER, callbacks=callbacks_list) ``` 以上就是关于采用DeepLabV3+来进行深度学习图像分割项目的概览介绍及其基本流程说明。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值