MaskRcnn(二)实例分割的图像与标签同时进行增强

最新推荐文章于 2025-02-12 11:47:05 发布

血狼傲骨

最新推荐文章于 2025-02-12 11:47:05 发布

阅读量5.5k

点赞数 28

分类专栏：实例分割 python专栏文章标签： python

本文链接：https://blog.youkuaiyun.com/sjjsbsbbs/article/details/119969754

版权

实例分割同时被 2 个专栏收录

17 篇文章

订阅专栏

python专栏

17 篇文章

订阅专栏

本文详细介绍了实例分割数据增强的原因和方法，包括防止过拟合、增强模型鲁棒性和泛化能力。通过平移、缩放、旋转等手段，提高识别精度。具体实现中，针对labelme标记的数据，进行了旋转处理，并对标签进行相应的旋转，最终保存处理后的图片和json文件。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、增强原因

1、防止过拟合

1.1、过拟合的定义

为了得到一致假设而使假设变得过度严格。
也就是说模型在训练集上的表现很好，但在测试集和新数据上的表现很差。
欠拟合，过拟合，适度拟合的区别。

训练集表现	测试集表现	结论
差	差	欠拟合
好	差	过拟合
好	好	适度拟合

1.2、过拟合出现的原因

①、模型复杂度过高，参数过多。
②、训练数据少。
③、训练集和测试集分布不一致。
④、样本里面的噪声数据干扰过大，导致模型过分记住了噪声特征，反而忽略了真实的输入输出特征。
⑤、训练集和测试集特征分布不一样。

1.3、解决方法

①、降低模型复杂度
②、扩大数据集或者进行数据增强
③、正则化
④、早停
⑤、清洗异常数据

2、增强结果模型的鲁棒性和泛化能力。

以下为我的理解，鲁棒性我理解为训练数据集中可能会出现一些错误，对这些错误的容忍能力越好，鲁棒性就越强，反之亦反，而泛化能力就是面对位置数据的一种预测能力，比如说我在A数据集上训练的，但在B数据集上仍然有比较好的表现，说明该模型或算法泛化能力比较好。

2.1、鲁棒性

模型对异常数据的容忍能力。

2.2、泛化性

模型对未知数据的预测能力。

3、提高识别精度

精度可以理解为模型好坏，把A预测为A，把B预测为B，预测的越准，精度越高。

二、常用数据增强方法

1、平移

对图像进行各个方向上的移动。

2、缩放

对图像进行缩小放大处理。

3、旋转

对图像进行一定角度的旋转。

4、随机裁切

对图像进行随机的裁剪。

5、颜色抖动

对图像的曝光度（exposure）、饱和度（saturation）和色调（hue）进行随机变化形成不同光照及颜色下的图片，达到数据增强的目的，尽可能使得模型能够使用不同光照条件小的情形，提高模型泛化能力。

6、随机遮挡

对图像进行小区域的遮挡。

7、噪声扰动:

对图像的每个像素RGB进行随机扰动, 常用的噪声模式是椒盐噪声和高斯噪声。

三、实例分割数据增强实现

因为实例分割数据集实在太难标记了，数据增强必不可少，使用手动标记的数据加上随机水平翻转进行训练后效果一般，决定把数据再一次进行增强，因为采用的是labelme标记的数据，必须对原图加上标签同时进行转变，决定尝试尝试。

1、旋转

对图像进行一定角度的旋转。
因为之前对原始数据加上水平翻转后的数据进行训练后，发现效果不是特别理想，看了看效果图后，发现原始数据大多数都是很规整的，而测试的数据如果有的物体不正，就会导致识别不出来，如下图。
在这里插入图片描述
如果比较规整，还蛮好识别的。

于是决定首先对图像进行旋转处理。

1.1先观察一下json文件

{
    "version": "4.5.9",               //labelme的版本号
    "flags": {},                      //flag标记
    "shapes": [                       //边由点构成的多边形
      {
        "line_color": null,           //线的颜色
        "fill_color": null,           //填充颜色
        "label": "2",                 //标签
        "points": [                   //一个个点，这里只列出一个点。
          [
            1365.8296754954001,
            1075.6984533100115
          ]
        ],
        "group_id": 1,                //实体的id号，同一类的第几个
        "shape_type": "polygon",      //图形类型
        "flags": {}                   //flag标记
      }
    ],
    "imagePath": "1.jpg"              //图片的位置

后面就是一长串一长串的东西，可以不用管它。根据这个json文件的特征进行修改标签。

1.2、导入必要的库

import cv2        #计算机视觉库
import json       #用来解析json文件
import numpy as np  #科学计算包
import sys #处理Python运行时配置以及资源
import os  #负责程序与操作系统的交互，提供了访问操作系统底层的接口。
import random  #实现了各种分布的伪随机数生成
import time #处理时间的标准库
import base64 #编解码库
from math import cos ,sin ,pi,fabs,radians #内置数学类函数库

1.3、导入图片

图片数量很多，不可能一张张图片进行，直接对一个文件里所有图片进行相同处理。

import cv2 as cv
import os
images_path = 'C:/Users/Administrator/Desktop/PersonCode/data/'  #图片的根目录
save_path = "C:/Users/Administrator/Desktop/PersonCode/sava_data/" #保存图片文件夹
file_list = os.listdir(images_path)
for img_name in file_list:
    print(img_name)

1.4、读取json文件

同样得对json进行批量读取，不过要注意每一张图片和json对应起来，这样处理的时候才能正确，下面给出读取json文件的代码，完整代码会放在最后面。
#读取json文件

def ReadJson(jsonfile):
    with open(jsonfile,encoding='utf-8') as f:
        jsonData = json.load(f)
    return jsonData

1.5、对图片进行旋转

图像旋转函数如下

def RotateImage(img, degree):
    height, width = img.shape[:2]    #获得图片的高和宽
    heightNew = int(width * fabs(sin(radians(degree))) + height * fabs(cos(radians(degree))))
    widthNew = int(height * fabs(sin(radians(degree))) + width * fabs(cos(radians(degree))))
    matRotation = cv2.getRotationMatrix2D((width // 2, height // 2), degree, 1)
    matRotation[0, 2] += (widthNew - width) // 2
    matRotation[1, 2] += (heightNew - height) // 2
    print(width // 2,height // 2)
    imgRotation = cv2.warpAffine(img, matRotation, (widthNew, heightNew), borderValue=(255, 255, 255))
    return imgRotation,matRotation

在这里插入图片描述
背景可以改cv2的参数进行替换，image_rotate = cv2.warpAffine(image, M, (nW, nH),borderValue=(255,255,255))中的borderValue删了，改成用边缘颜色填充即可，加上参数borderMode=1，也就是改成imgRotation = cv2.warpAffine(img, matRotation, (widthNew, heightNew),borderMode=1)。
虽然看着有些别扭，但是能训练就行。
在这里插入图片描述

1.6、对图片进行裁剪

发现旋转后的图片有些高，因为很多都是补充出来的，而我使用的数据集大多数在中间，其实可以四周裁剪一下。
采用以下方法

imgRotation=imgRotation[imgRotation.shape[0]//9:(imgRotation.shape[0]//9)*8,imgRotation.shape[1]//9:(imgRotation.shape[1]//9)*8]

其实就是从长和宽的九分之一处到九分之八处裁剪一下。imgRotation[0]处理的是y方向(纵)，imgRotation[1]处理x方向(横)，裁剪后得改标签，慢慢探索。

1.7、对标签进行旋转

#坐标旋转
def rotatePoint(Srcimg_rotate,jsonTemp,M,imagePath):
    json_dict = {}
    for key, value in jsonTemp.items():
        if key=='imageHeight':
            json_dict[key]=Srcimg_rotate.shape[0]
            print('gao',json_dict[key])
        elif key=='imageWidth':
            json_dict[key] = Srcimg_rotate.shape[1]
            print('kuai',json_dict[key])
        elif key=='imageData':
            json_dict[key] = image_to_base64(Srcimg_rotate)
        elif key=='imagePath':
            json_dict[key] = imagePath
        else:
            json_dict[key] = value
    for item in json_dict['shapes']:
        for key, value in item.items():
            if key == 'points':
                for item2 in range(len(value)):
                    pt1=np.dot(M,np.array([[value[item2][0]],[value[item2][1]],[1]]))
                    value[item2][0], value[item2][1] = pt1[0][0], pt1[1][0]
    return json_dict

1.7、保存图片

cv2.imwrite("./result/"+img_name[:-4]+"_30"+'.jpg',img_rotate)
WriteJson('./result/'+img_name[:-4]+"_30"+'.json', json_rotate)

1.8、完整代码

import cv2        #计算机视觉库
import json       #用来解析json文件
import numpy as np  #科学计算包
import sys #处理Python运行时配置以及资源
import os  #负责程序与操作系统的交互，提供了访问操作系统底层的接口。
import random  #实现了各种分布的伪随机数生成
import time #处理时间的标准库
import base64 #编解码库
from math import cos ,sin ,pi,fabs,radians #内置数学类函数库
images_path = 'C:/Users/Administrator/Desktop/PersonCode/data/'  #图片的根目录
json_path='C:/Users/Administrator/Desktop/PersonCode/json/'  #json文件的根目录
save_path = "C:/Users/Administrator/Desktop/PersonCode/sava_data/" #保存图片文件夹
#读取json文件
def ReadJson(jsonfile):
    with open(jsonfile,encoding='utf-8') as f:
        jsonData = json.load(f)
    return jsonData
#保存json
def WriteJson(filePath,data):
    write_json = open(filePath,'w')
    write_json.write(json.dumps(data,indent=2)) 
    write_json.close()
def rotate_bound(image, angle):
    h, w,_ = image.shape
    (cX, cY) = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))
    M[0, 2] += (nW / 2) - cX
    M[1, 2] += (nH / 2) - cY
    image_rotate = cv2.warpAffine(image, M, (nW, nH),borderMode=1)
    return image_rotate,cX,cY,angle
def RotateImage(img, degree):
    height, width = img.shape[:2]    #获得图片的高和宽
    heightNew = int(width * fabs(sin(radians(degree))) + height * fabs(cos(radians(degree))))
    widthNew = int(height * fabs(sin(radians(degree))) + width * fabs(cos(radians(degree))))
    matRotation = cv2.getRotationMatrix2D((width // 2, height // 2), degree, 1)
    matRotation[0, 2] += (widthNew - width) // 2
    matRotation[1, 2] += (heightNew - height) // 2
    print(width // 2,height // 2)
    imgRotation = cv2.warpAffine(img, matRotation, (widthNew, heightNew),borderMode=1)
    
    return imgRotation,matRotation
def rotate_xy(x, y, angle, cx, cy):
    # print(cx,cy)
    angle = angle * pi / 180
    x_new = (x - cx) * cos(angle) - (y - cy) * sin(angle) + cx
    y_new = (x - cx) * sin(angle) + (y - cy) * cos(angle) + cy
    return x_new, y_new
#转base64
def image_to_base64(image_np):
    image = cv2.imencode('.jpg', image_np)[1]
    image_code = str(base64.b64encode(image))[2:-1]
    return image_code
#坐标旋转
def rotatePoint(Srcimg_rotate,jsonTemp,M,imagePath):
    json_dict = {}
    for key, value in jsonTemp.items():
        if key=='imageHeight':
            json_dict[key]=Srcimg_rotate.shape[0]
            print('gao',json_dict[key])
        elif key=='imageWidth':
            json_dict[key] = Srcimg_rotate.shape[1]
            print('kuai',json_dict[key])
        elif key=='imageData':
            json_dict[key] = image_to_base64(Srcimg_rotate)
        elif key=='imagePath':
            json_dict[key] = imagePath
        else:
            json_dict[key] = value
    for item in json_dict['shapes']:
        for key, value in item.items():
            if key == 'points':
                for item2 in range(len(value)):
                    pt1=np.dot(M,np.array([[value[item2][0]],[value[item2][1]],[1]]))
                    value[item2][0], value[item2][1] = pt1[0][0], pt1[1][0]
    return json_dict

if __name__=='__main__':
    file_list = os.listdir(images_path)
    i=0
    for img_name in file_list:
        i=i+1
        if i==211:
            break
        SrcImg=cv2.imread(images_path+img_name)            #读取图片
        JsonData=ReadJson(json_path+img_name[:-3]+'json')    #读取对应的json文件
        img_rotate,mat_rotate=RotateImage(SrcImg, 30)    #旋转图片
        json_rotate=rotatePoint(img_rotate,JsonData,mat_rotate,img_name)
        cv2.imwrite("./result/"+img_name[:-4]+"_30"+'.jpg',img_rotate)
        WriteJson('./result/'+img_name[:-4]+"_30"+'.json', json_rotate)
        print(img_name,"is ok!")