关于deeplabv3+ 使用自定义数据集训练出现问题2
前言
见 : 原创 关于deeplabv3+ 使用自定义数据集训练出现的问题1
1 新出现的问题
1.1 问题简述
首先由于我使用的数据处理是从dataloaders/datasets/pascal.py 复制过来的,所以很多代码都是从这里变化过来的。正因为如此,而我们的数据集瓷瓦裂痕的数据集,MASK 图片颜色就两种,背景是RGB:(0,0,0),识别项是RGB(255,255,255)
从voc 标注的图片,它是用PIL.Imgae 下的类打开图片,保存图片的模式’P’,该模式自行必应。简单来说,每个颜色对应一个色盘
例如:(0, 0, 0)-----1
(255,255,255)-----2
每个颜色对应的一个调色盘。
1.2 VOC 的数据集中对应调色盘
VOC 的数据集中对应调色盘大概是如此的
array([[ 0, 0, 0],
[128, 0, 0],
[ 0, 128, 0],
[128, 128, 0],
[ 0, 0, 128],
[128, 0, 128],
[ 0, 128, 128],
[128, 128, 128],
[ 64, 0, 0],
[192, 0, 0],
[ 64, 128, 0],
[192, 128, 0],
[ 64, 0, 128],
[192, 0, 128],
[ 64, 128, 128],
[192, 128, 128],
[ 0, 64, 0],
[128, 64, 0],
[ 0, 192, 0],
[128, 192, 0],
[ 0, 64, 128],
[128, 64, 128],
[ 0, 192, 128],
[128, 192, 128],
[ 64, 64, 0],
[192, 64, 0],
[ 64, 192, 0],
[192, 192, 0],
[ 64, 64, 128],
[192, 64, 128],
[ 64, 192, 128],
[192, 192, 128],
[ 0, 0, 64],
[128, 0, 64],
[ 0, 128, 64],
[128, 128, 64],
[ 0, 0, 192],
[128, 0, 192],
[ 0, 128, 192],
[128, 128, 192],
[ 64, 0, 64],
[192, 0, 64],
[ 64, 128, 64],
[192, 128, 64],
[ 64, 0, 192],
[192, 0, 192],
[ 64, 128, 192],
[192, 128, 192],
[ 0, 64, 64],
[128, 64, 64],
[ 0, 192, 64],
[128, 192, 64],
[ 0, 64, 192],
[128, 64, 192],
[ 0, 192, 192],
[128, 192, 192],
[ 64, 64, 64],
[192, 64, 64],
[ 64, 192, 64],
[192, 192, 64],
[ 64, 64, 192],
[192, 64, 192],
[ 64, 192, 192],
[192, 192, 192],
[ 32, 0, 0],
[160, 0, 0],
[ 32, 128, 0],
[160, 128, 0],
[ 32, 0, 128],
[160, 0, 128],
[ 32, 128, 128],
[160, 128, 128],
[ 96, 0, 0],
[224, 0, 0],
[ 96, 128, 0],
[224, 128, 0],
[ 96, 0, 128],
[224, 0, 128],
[ 96, 128, 128],
[224, 128, 128],
[ 32, 64, 0],
[160, 64, 0],
[ 32, 192, 0],
[160, 192, 0],
[ 32, 64, 128],
[160, 64, 128],
[ 32, 192, 128],
[160, 192, 128],
[ 96, 64, 0],
[224, 64, 0],
[ 96, 192, 0],
[224, 192, 0],
[ 96, 64, 128],
[224, 64, 128],
[ 96, 192, 128],
[224, 192, 128],
[ 32, 0, 64],
[160, 0, 64],
[ 32, 128, 64],
[160, 128, 64],
[ 32, 0, 192],
[160, 0, 192],
[ 32, 128, 192],
[160, 128, 192],
[ 96, 0, 64],
[224, 0, 64],
[ 96, 128, 64],
[224, 128, 64],
[ 96, 0, 192],
[224, 0, 192],
[ 96, 128, 192],
[224, 128, 192],
[ 32, 64, 64],
[160, 64, 64],
[ 32, 192, 64],
[160, 192, 64],
[ 32, 64, 192],
[160, 64, 192],
[ 32, 192, 192],
[160, 192, 192],
[ 96, 64, 64],
[224, 64, 64],
[ 96, 192, 64],
[224, 192, 64],
[ 96, 64, 192],
[224, 64, 192],
[ 96, 192, 192],
[224, 192, 192],
[ 0, 32, 0],
[128, 32, 0],
[ 0, 160, 0],
[128, 160, 0],
[ 0, 32, 128],
[128, 32, 128],
[ 0, 160, 128],
[128, 160, 128],
[ 64, 32, 0],
[192, 32, 0],
[ 64, 160, 0],
[192, 160, 0],
[ 64, 32, 128],
[192, 32, 128],
[ 64, 160, 128],
[192, 160, 128],
[ 0, 96, 0],
[128, 96, 0],
[ 0, 224, 0],
[128, 224, 0],
[ 0, 96, 128],
[128, 96, 128],
[ 0, 224, 128],
[128, 224, 128],
[ 64, 96, 0],
[192, 96, 0],
[ 64, 224, 0],
[192, 224, 0],
[ 64, 96, 128],
[192, 96, 128],
[ 64, 224, 128],
[192, 224, 128],
[ 0, 32, 64],
[128, 32, 64],
[ 0, 160, 64],
[128, 160, 64],
[ 0, 32, 192],
[128, 32, 192],
[ 0, 160, 192],
[128, 160, 192],
[ 64, 32, 64],
[192, 32, 64],
[ 64, 160, 64],
[192, 160, 64],
[ 64, 32, 192],
[192, 32, 192],
[ 64, 160, 192],
[192, 160, 192],
[ 0, 96, 64],
[128, 96, 64],
[ 0, 224, 64],
[128, 224, 64],
[ 0, 96, 192],
[128, 96, 192],
[ 0, 224, 192],
[128, 224, 192],
[ 64, 96, 64],
[192, 96, 64],
[ 64, 224, 64],
[192, 224, 64],
[ 64, 96, 192],
[192, 96, 192],
[ 64, 224, 192],
[192, 224, 192],
[ 32, 32, 0],
[160, 32, 0],
[ 32, 160, 0],
[160, 160, 0],
[ 32, 32, 128],
[160, 32, 128],
[ 32, 160, 128],
[160, 160, 128],
[ 96, 32, 0],
[224, 32, 0],
[ 96, 160, 0],
[224, 160, 0],
[ 96, 32, 128],
[224, 32, 128],
[ 96, 160, 128],
[224, 160, 128],
[ 32, 96, 0],
[160, 96, 0],
[ 32, 224, 0],
[160, 224, 0],
[ 32, 96, 128],
[160, 96, 128],
[ 32, 224, 128],
[160, 224, 128],
[ 96, 96, 0],
[224, 96, 0],
[ 96, 224, 0],
[224, 224, 0],
[ 96, 96, 128],
[224, 96, 128],
[ 96, 224, 128],
[224, 224, 128],
[ 32, 32, 64],
[160, 32, 64],
[ 32, 160, 64],
[160, 160, 64],
[ 32, 32, 192],
[160, 32, 192],
[ 32, 160, 192],
[160, 160, 192],
[ 96, 32, 64],
[224, 32, 64],
[ 96, 160, 64],
[224, 160, 64],
[ 96, 32, 192],
[224, 32, 192],
[ 96, 160, 192],
[224, 160, 192],
[ 32, 96, 64],
[160, 96, 64],
[ 32, 224, 64],
[160, 224, 64],
[ 32, 96, 192],
[160, 96, 192],
[ 32, 224, 192],
[160, 224, 192],
[ 96, 96, 64],
[224, 96, 64],
[ 96, 224, 64],
[224, 224, 64],
[ 96, 96, 192],
[224, 96, 192],
[ 96, 224, 192],
[224, 224, 192]], dtype=uint8)
1.3 想要的调色盘
因此,从这里我原本想法是这样,让第一个类对应的是[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-A81SYSMl-1609727041806)(/home/dlh/Pictures/Screenshot from 2021-01-04 09-20-39.png)]
结果就是
[[ 0 0 0] <--- 背景颜色 ----0
[128 0 0] <--- 标注的第一个类,对应调色盘上的1 -----1
[ 2 2 2] <--- 标注的第二个类,对应调色盘上的2 -----2
[ 3 3 3] <--- 标注的第三个类,对应调色盘上的3 -----3
[ 4 4 4] <--- 标注的第四个类,对应调色盘上的4 -----4
[ 5 5 5] <--- 标注的第五个类,对应调色盘上的5 -----5
[ 6 6 6]
...
...
[253 253 253]
[254 254 254]
[224 224 192]] <--- 忽略的颜色 -----255
但是无论如何做,似乎如何也做不到我想象的情况!
我做的改动:
- 从原图上改动
import numpy as np
from PIL import Image
# 打开图像
im = Image.open('a.png')
# 获取调色盘
palette = np.array(im.getpalette(),dtype=np.uint8).reshape((256,3))
# 将2号对应的颜色深拷贝过来
temp = copy.deepcopy(palette[2])
# 改变调色盘上的 palette[2]Value [224 224 192] --> [2 2 2]
palette[2] = (2,2,2)
#改变调色盘上的[255]Value [255 255 255] -->[224 224 192]
palette[255] = temp
#
aa = palette.flatten().tolist()
#调色盘上方回去
im.putpalette(aa)
结果就是
[[ 0 0 0] <--- 背景颜色 ----0
[128 0 0] <--- 标注的第一个类 -----1
[ 2 2 2] <--- 这是忽略的颜色 -----2 变成了【2 2 2】 # 那我要如果和让 [224 224 192] 对应成 调色盘上的255呢?目前没有答案
[ 3 3 3]
[ 4 4 4]
[ 5 5 5]
[ 6 6 6]
...
...
[253 253 253]
[254 254 254]
[224 224 192]]
然后,我只能妥协了:没时间高这么多了,没有了忽略颜色,直接全部是我需要的。并且每个类都是标注的1,如果成为我想的那个数据集 估计要重新用标注软件重新标注,费时腓力。
1.4 最终的调色盘
最终的结果就是
# 这是第一个类的颜色
[[ 0 0 0] <--- 背景颜色 ----0
[128 0 0] <--- 标注的类 -----1 # 红色
[ 2 2 2]
...
...
[255 255 255]]
# 这是第二个类的颜色
[[ 0 0 0] <--- 背景颜色 ----0
[ 0 128 0] <--- 标注的类 -----1 # 绿色
[ 2 2 2]
...
...
[255 255 255]]
2 代码块的修改
2.1 pascal.py copy->mydata.py
这里有一个地方恶心到我了,因为有时候用图片测试载入,弄者弄着就全黑了,但原图是有mask的标记了,我也不知道为啥。我找了几天,问题很简单,就是在对图片和mask 处理的时候,它有一个随即裁减的功能,然后就是把mask给剪掉了。。。剪掉了。。。裂开…
所以有了一个 custom_transforms_myda.py。改的地方挺多的
class FixedResize(object):
def __init__(self, size):
self.size = (size, size) # size: (h, w)
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
assert img.size == mask.size
img = img.resize(self.size, Image.BILINEAR)
mask = mask.resize(self.size, Image.NEAREST)
return {'image': img,
'label': mask,
'classid': idclass} #@ <-- 每个都需要返回一个id 用来识别1号调色盘 属于那个类
整个 custom_transforms_myda.py 在这里
import torch
import random
import numpy as np
from PIL import Image, ImageOps, ImageFilter
import cv2
import matplotlib.pyplot as plt
class Normalize(object):
"""Normalize a tensor image with mean and standard deviation.
Args:
mean (tuple): means for each channel.
std (tuple): standard deviations for each channel.
"""
def __init__(self, mean=(0., 0., 0.), std=(1., 1., 1.)):
self.mean = mean
self.std = std
def __call__(self, sample):
img = sample['image']
maska = sample['label']
idclass = sample['classid']
img = np.array(img).astype(np.float32)
mask = np.array(maska).astype(np.float32)
img /= 255.0
img -= self.mean
img /= self.std
return {'image': img,
'label': mask,
'classid': idclass}
class ToTensor(object):
"""Convert ndarrays in sample to Tensors."""
def __call__(self, sample):
# swap color axis because
# numpy image: H x W x C
# torch image: C X H X W
img = sample['image']
mask = sample['label']
idclass = sample['classid']
img = np.array(img).astype(np.float32).transpose((2, 0, 1))
mask = np.array(mask).astype(np.float32)
img = torch.from_numpy(img).float()
mask = torch.from_numpy(mask).float()
return {'image': img,
'label': mask,
'classid': idclass}
class RandomHorizontalFlip(object):
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
if random.random() < 0.5:
img = img.transpose(Image.FLIP_LEFT_RIGHT)
mask = mask.transpose(Image.FLIP_LEFT_RIGHT)
return {'image': img,
'label': mask,
'classid': idclass}
class RandomRotate(object):
def __init__(self, degree):
self.degree = degree
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
rotate_degree = random.uniform(-1*self.degree, self.degree)
img = img.rotate(rotate_degree, Image.BILINEAR)
mask = mask.rotate(rotate_degree, Image.NEAREST)
return {'image': img,
'label': mask,
'classid': idclass}
class RandomGaussianBlur(object):
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
if random.random() < 0.5:
img = img.filter(ImageFilter.GaussianBlur(
radius=random.random()))
return {'image': img,
'label': mask,
'classid': idclass}
class RandomScaleCrop(object):
def __init__(self, base_size, crop_size, fill=0):
self.base_size = base_size
self.crop_size = crop_size
self.fill = fill
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
# random scale (short edge)
short_size = random.randint(int(self.base_size * 0.5), int(self.base_size * 2.0))
w, h = img.size
if h > w:
ow = short_size
oh = int(1.0 * h * ow / w)
else:
oh = short_size
ow = int(1.0 * w * oh / h)
img = img.resize((ow, oh), Image.BILINEAR)
mask = mask.resize((ow, oh), Image.NEAREST)
# pad crop
if short_size < self.crop_size:
padh = self.crop_size - oh if oh < self.crop_size else 0
padw = self.crop_size - ow if ow < self.crop_size else 0
img = ImageOps.expand(img, border=(0, 0, padw, padh), fill=0)
mask = ImageOps.expand(mask, border=(0, 0, padw, padh), fill=self.fill)
# random crop crop_size
w, h = img.size
x1 = random.randint(0, w - self.crop_size)
y1 = random.randint(0, h - self.crop_size)
img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
return {'image': img,
'label': mask,
'classid': idclass}
class FixScaleCrop(object):
def __init__(self, crop_size):
self.crop_size = crop_size
self.classed = [(128,0,0), (0,128,0), (128,128,0), (0,0,128),(128,0,128),(0,0,0)]
def letterbox(self, img, new_shape=(513, 513), color=(0, 0, 0),id =0, isNeedToConvert = False,auto=True, scaleFill=False, scaleup=True):
# Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
# w, h = img.size
img = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)
# shape = (w,h)
shape = img.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better test mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
#
if shape != new_unpad: # resize
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
# print(img.shape)
if isNeedToConvert:
new_img = Image.new('P', new_shape, (0, 0, 0))
for h in range(0, new_shape[0]):
for j in range(0, new_shape[1]):
(b, g, r) = img[h, j]
if (b, g, r) == (1, 1, 1):
new_img.putpixel((j, h), self.classed[id])
#img[h, j] = self.classed[id]
return new_img, ratio, (dw, dh)
else:
image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), mode='RGB')
return image, ratio, (dw, dh)
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
mask, ratio, pad = self.letterbox(mask,new_shape=513, id=idclass,isNeedToConvert=True, auto=False)
img, ratio, pad = self.letterbox(img, auto=False)
# oh = self.crop_size
# ow = int(1.0 * w * oh / h)
# else:
# ow = self.crop_size
# oh = int(1.0 * h * ow / w)
# img = img.resize((ow, oh), Image.BILINEAR)
# mask = mask.resize((ow, oh), Image.NEAREST)
# # center crop
# w, h = img.size
# x1 = int(round((w - self.crop_size) / 2.))
# y1 = int(round((h - self.crop_size) / 2.))
# img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
# mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
return {'image': img,
'label': mask,
'classid': idclass}
class FixScaleCrop_val(object):
def __init__(self, crop_size):
self.crop_size = crop_size
#self.classed = [(128,0,0), (0,128,0), (128,128,0), (0,0,128),(128,0,128),(0,0,0)]
def letterbox(self, img, new_shape=(513, 513), color=(0, 0, 0),id =0, isNeedToConvert = False,auto=True, scaleFill=False, scaleup=True):
# Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
# w, h = img.size
img = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)
# shape = (w,h)
shape = img.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better test mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
#
if shape != new_unpad: # resize
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
# print(img.shape)
if isNeedToConvert:
new_img = Image.new('P', new_shape, (0, 0, 0))
for h in range(0, new_shape[0]):
for j in range(0, new_shape[1]):
(b, g, r) = img[h, j]
if (b, g, r) == (1, 1, 1):
new_img.putpixel((j, h), self.classed[id])
#img[h, j] = self.classed[id]
return new_img, ratio, (dw, dh)
else:
image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), mode='RGB')
return image, ratio, (dw, dh)
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
mask, ratio, pad = self.letterbox(mask,new_shape=513, id=idclass,isNeedToConvert=True, auto=False)
img, ratio, pad = self.letterbox(img, auto=False)
# oh = self.crop_size
# ow = int(1.0 * w * oh / h)
# else:
# ow = self.crop_size
# oh = int(1.0 * h * ow / w)
# img = img.resize((ow, oh), Image.BILINEAR)
# mask = mask.resize((ow, oh), Image.NEAREST)
# # center crop
# w, h = img.size
# x1 = int(round((w - self.crop_size) / 2.))
# y1 = int(round((h - self.crop_size) / 2.))
# img = img.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
# mask = mask.crop((x1, y1, x1 + self.crop_size, y1 + self.crop_size))
return {'image': img,
'label': mask}
class FixedResize(object):
def __init__(self, size):
self.size = (size, size) # size: (h, w)
def __call__(self, sample):
img = sample['image']
mask = sample['label']
idclass = sample['classid']
assert img.size == mask.size
img = img.resize(self.size, Image.BILINEAR)
mask = mask.resize(self.size, Image.NEAREST)
return {'image': img,
'label': mask,
'classid': idclass}
mydata.py
from __future__ import print_function, division
import os
from PIL import Image
import numpy as np
from torch.utils.data import Dataset
from mypath import Path
from torchvision import transforms
from dataloaders import custom_transforms_myda as tr
class VOCSegmentation(Dataset):
"""
PascalVoc dataset
"""
NUM_CLASSES = 5+1
def __init__(self,
args,
base_dir=Path.db_root_dir('mydata'),
split='train',
):
"""
:param base_dir: path to VOC dataset directory
:param split: train/val
:param transform: transform to apply
"""
super().__init__()
self._base_dir = base_dir
self._image_dir = os.path.join(self._base_dir, 'JPEGImages')
self._cat_dir = os.path.join(self._base_dir, 'SegmentationClass')
if isinstance(split, str):
self.split = [split]
else:
split.sort()
self.split = split
self.args = args
_splits_dir = os.path.join(self._base_dir, 'ImageSets')
self.im_ids = []
self.images = []
self.categories = []
for splt in self.split:
with open(os.path.join(os.path.join(_splits_dir, splt + '.txt')), "r") as f:
lines = f.read().splitlines()
for ii, line in enumerate(lines):
_image = os.path.join(self._image_dir, line)
fpath,fname = os.path.split(_image)
fnewname = fname.replace('.jpg','.png')
_cat = os.path.join(self._cat_dir, fnewname)
#print(fnewname)
assert os.path.isfile(_image)
assert os.path.isfile(_cat)
self.im_ids.append(line)
self.images.append(_image)
self.categories.append(_cat)
assert (len(self.images) == len(self.categories))
# Display stats
print('Number of images in {}: {:d}'.format(split, len(self.images)))
def __len__(self):
return len(self.images)
def __getitem__(self, index):
_img, _target, _clasid = self._make_img_gt_point_pair(index)
sample = {'image': _img, 'label': _target, 'classid': _clasid}
for split in self.split:
if split == "train":
# return sample
return self.transform_tr(sample)
elif split == 'val':
return self.transform_val(sample)
def _make_img_gt_point_pair(self, index):
_img = Image.open(self.images[index]).convert('RGB')
#_img = Image.open(a).convert('RGB')
# print(self.categories[index])
#_target = Image.open(b)
_target = Image.open(self.categories[index])
_,fname = os.path.split(self.categories[index])
clasid = int(fname.split('-')[0])
return _img, _target, clasid
def transform_tr(self, sample):
composed_transforms = transforms.Compose([
#tr.RandomHorizontalFlip(),
tr.FixScaleCrop(crop_size=self.args.crop_size),
tr.RandomGaussianBlur(),
tr.Normalize(mean=(0.24127093, 0.2287277, 0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
tr.ToTensor()])
return composed_transforms(sample)
def transform_val(self, sample):
composed_transforms = transforms.Compose([
tr.FixScaleCrop(crop_size=self.args.crop_size),
# tr.Normalize(),
#tr.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
tr.Normalize(mean=(0.24127093, 0.2287277, 0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
tr.ToTensor()])
return composed_transforms(sample)
def __str__(self):
return 'mydata(split=' + str(self.split) + ')'
if __name__ == '__main__':
from dataloaders.utils import decode_segmap_mydata
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import argparse
def getNonRepeatList2(data):
new_data = []
data = data.flatten()
for i in range(len(data)):
if data[i] not in new_data:
new_data.append(data[i])
return new_data
def getNonRepeatList3( data):
return [i for n, i in enumerate(data) if i not in data[:n]]
parser = argparse.ArgumentParser()
args = parser.parse_args()
args.base_size = 513
args.crop_size = 513
voc_train = VOCSegmentation(args, split='train')
dataloader = DataLoader(voc_train, batch_size=5, shuffle=True, num_workers=0,drop_last=True)
for ii, sample in enumerate(dataloader):
for jj in range(sample["image"].size()[0]):
img = sample['image'].numpy()
gt = sample['label'].numpy()
#posdd = getNonRepeatList3(ttt)
#aset =list( set(gt.tolist()))
classid = sample['classid'].numpy()
temp = np.array(gt[jj])
temp_max = np.max(temp)
temp_t = np.array(gt[jj]).astype(np.uint8)
temp_t_max = np.max(temp_t)
tmp = np.array(gt[jj]).astype(np.uint8)
segmap = decode_segmap_mydata(tmp, classid[jj], dataset='mydata')
img_tmp = np.transpose(img[jj], axes=[1, 2, 0])
img_tmp *= (0.07584574, 0.05697405, 0.07654408)
img_tmp += (0.24127093, 0.2287277, 0.24580745)
img_tmp *= 255.0
img_tmp = img_tmp.astype(np.uint8)
plt.figure()
plt.title('display')
plt.subplot(211)
plt.imshow(img_tmp)
plt.subplot(212)
plt.imshow(segmap)
if ii == 1:
break
plt.show(block=True)
命令行参数:
–backbone
resnet
–dataset
mydata
train.py
def training(self, epoch):
train_loss = 0.0
self.model.train()
tbar = tqdm(self.train_loader)
num_img_tr = len(self.train_loader)
for i, sample in enumerate(tbar):
# mydata
image, target, id = sample['image'], sample['label'], sample['classid'] # 这里是为了适应我的数据集改的
#image, target = sample['image'], sample['label'] # 以前的地方
if self.args.cuda:
image, target = image.cuda(), target.cuda()
self.scheduler(self.optimizer, i, epoch, self.best_pred)
self.optimizer.zero_grad()
output = self.model(image)
loss = self.criterion(output, target)
loss.backward()
self.optimizer.step()
train_loss += loss.item()
tbar.set_description('Train loss: %.3f' % (train_loss / (i + 1)))
self.writer.add_scalar('train / total_loss_iter', loss.item(), i + num_img_tr * epoch)
# Show 10 * 3 inference results each epoch
if i % (num_img_tr // 10) == 0:
global_step = i + num_img_tr * epoch
#self.summary.visualize_image(self.writer, self.args.dataset, image, target, output, global_step)# 这里注释了 tensorboard 因为要家在id信息,改的挺多的。不想该了,就不看了
self.writer.add_scalar('train/total_loss_epoch', train_loss, epoch)
print('[Epoch: %d, numImages: %5d]' % (epoch, i * self.args.batch_size + image.data.shape[0]))
print('Loss: %.3f' % train_loss)
if self.args.no_val:
# save checkpoint every epoch
is_best = False
self.saver.save_checkpoint({
'epoch': epoch + 1,
'state_dict': self.model.module.state_dict(),
'optimizer': self.optimizer.state_dict(),
'best_pred': self.best_pred,
}, is_best)
训练完之后的运行代码
demotest.py
import argparse
import os
import numpy as np
import time
from modeling.deeplab import *
from dataloaders import custom_transforms_myda as tr
from PIL import Image
from torchvision import transforms
from dataloaders.utils import *
from torchvision.utils import make_grid,save_image
def main():
argparser = argparse.ArgumentParser(description='Pytorch DeeplabV3Plus Training')
argparser.add_argument('--in_path', type=str, required=True, help='image to set')
argparser.add_argument('--out_path', type=str, required=True, help='image to save result')
argparser.add_argument('--backbone', type=str, default='resnet', choices=['resnet','xception','drn','mobilenet'], help='backbone name (deafult:resnet)')
argparser.add_argument('--ckpt', type=str, default='deeplab-resnet.pth', help='saved model')
argparser.add_argument('--out_stride', type=int, default=16, help='network output stride (deafult: 8)')
argparser.add_argument('--no_cuda', action='store_true', default=False,
help='disables CUDA training')
argparser.add_argument('--gpu_ids', type=str, default='0',
help='use which gpu to train,must be a comma-separated list of intergers only(deafult=0)')
argparser.add_argument('--dataset', type=str, default='mydata',choices=['pascal','coco','cityscaoes','mydata'],
help='dataset name (deafult=coco)')
argparser.add_argument('--crop_size', type=int, default=513,
help='crop image size (deafult=513)')
argparser.add_argument('--num_classes', type=int, default=21,
help='how many classes to sep')
argparser.add_argument('--sync_bn', type=bool, default=None,
help='whether to use sync bn (deafult=auto)')
args = argparser.parse_args()
print(args)
a = torch.cuda.is_available()
b = not args.no_cuda
args.cuda = not args.no_cuda and torch.cuda.is_available()
if args.cuda:
try:
args.gpu_ids = [int(s) for s in args.gpu_ids.split(',')]
except ValueError:
raise ValueError('Argument --gpu_ids must be a comma-separated list of integers only')
if args.sync_bn is None:
if args.cuda and len(args.gpu_ids) > 1:
args.sync_bn = True
else:
args.sync_bn = False
model_s_time = time.time()
model = DeepLab(num_classes=args.num_classes,
backbone=args.backbone,
output_stride=args.out_stride,
sync_bn=args.sync_bn,
)
ckpt = torch.load(args.ckpt, map_location='cpu')
model.load_state_dict(ckpt['state_dict'])
model = model.cuda()
model_u_time = time.time()
model_load_time = model_u_time - model_s_time
print('[INFO] model load time is {}'.format(model_load_time))
composed_transforms = transforms.Compose([
# tr.RandomHorizontalFlip(),
tr.FixScaleCrop(crop_size=513),
tr.RandomGaussianBlur(),
tr.Normalize(mean=(0.24127093, 0.2287277, 0.24580745), std=(0.07584574, 0.05697405, 0.07654408)),
tr.ToTensor()])
for name in os.listdir(args.in_path):
s_time = time.time()
image = Image.open(args.in_path + '/' + name).convert('RGB')
# image =
target = Image.open(args.in_path + '/' + name).convert('P')
clasid = int(name.split('-')[0])
sample = {'image': image, 'label': target, 'classid': clasid}
# sample = {'image': image,'label': target}
tensor_in = composed_transforms(sample)['image'].unsqueeze(0)
tensor_in.to()
model.eval()
if args.cuda:
tensor_in = tensor_in.cuda()
with torch.no_grad():
output = model(tensor_in)
grid_image = make_grid(decode_seg_map_sequence(torch.max(output[:3], 1)[1].detach().cpu().numpy()))
save_image(grid_image, args.in_path + '/' + '{}_mask.png'.format(name[0:-4]))
u_time = time.time()
img_time = u_time - s_time
print('image:{} time:{}'.format(name, img_time))
print('image save in ' + args.in_path)
# def make_data(args, **kwargs):
# if args.dataset == 'coco':
# train_set = coco.COCOSegmentation(args, split='train')
# val_set = coco.COCOSegmentation(args, split='val')
# num_class = train_set.NUM_CLASSES
# train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, **kwargs)
# val_loader = DataLoader(val_set, batch_size=args.batch_size, shuffle=False, **kwargs)
# test_loader = None
# return train_loader, val_loader, test_loader, num_class
if __name__ == '__main__':
main()
运行结果
下图是原图
下图是运行后预测出来的图
下图是原本标注的图
3 总结
总体结果结果看起来还是不错的。给我的教训就是要按照官方的标注数据格式来,会更快的完成所需要的项目。