Mask R_CNN数据标注及训练测试_maskrcnn 图片输入resize后标注的-优快云博客

本文链接：https://blog.youkuaiyun.com/HTTang9/article/details/103707492

本文介绍了如何进行Mask R-CNN的数据标注，解析标注后的json文件，并详细讲解了配置环境、数据预处理、模型训练及测试的过程。涉及到的工具有Python、TensorFlow、CUDA、CUDNN和Labelme，同时也提到了标注图像可能出现的问题及其解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

环境准备

1、python环境和需要的依赖安装，和TensorFlow

2、利用GPU加速训练，用到cuda与cudnn。

3、labelme下载：　　
　　　　　　　　　　也可以：安装好python环境和pip，之后使用命令 pip install labelme 。
　　　　　　　　　　如果使用anaconda，同样conda install labelme。

数据准备

1、数据标注

在这里插入图片描述

2、标注后得到的json文件解析

找到lablme生成后json文件，需要将其转化成下图所示，一共包含五个不同文件。我用的批量转换方式，如下代码。将对应的路径修改为自己的json路径和待保存路径。

#!/usr/bin/env python
# _*_ coding: UTF-8 _*_
#!/bin/bash
'''对指定路径中的json文件进行解析，生成相应的数据'''
import os
import natsort
labelme_json = "G:\\anaconda\\lb\Scripts\labelme_json_to_dataset.exe" #labelme_json_to_dataset.exe 程序路径
file_path = "G:\Mask_RCNN-master\\train_data\json"   # 处理文件所在路径
dir_info = os.listdir(file_path)
dir_info = natsort.natsorted(dir_info)
"""循环处理‘.json’文件"""
for file_name in dir_info:
    file_name = os.path.join(file_path + "\\" + file_name)
    os.system('cd G:\Mask_RCNN-master\TNISD_300a\\fan_json\\')
    os.system(labelme_json + " " + file_name)

在这里插入图片描述
注：有些情况下，生成的label图片会出现全黑，正常情况。

由于labelme生成的掩码标签 label.png为16位存储，opencv默认读取8位，需要将16位转8位，可通过C++程序转化，代码请参考这篇博文：http://blog.youkuaiyun.com/l297969586/article/details/79154150

3、各个.py文件（）

修改config.py

"""
Mask R-CNN
Base Configurations class.

Copyright (c) 2017 Matterport, Inc.
Licensed under the MIT License (see LICENSE for details)
Written by Waleed Abdulla
"""

import numpy as np


class Config(object):
    """Base configuration class. For custom configurations, create a
    sub-class that inherits from this one and override properties
    that need to be changed.
    """
    # experiment is running.
    NAME = None  # Override in sub-classes
    GPU_COUNT = 1         #使用的GPU核心数
    IMAGES_PER_GPU = 1
    # a lot of time on validation stats.
    STEPS_PER_EPOCH = 1000

    # down the training.
    VALIDATION_STEPS = 50                

    BACKBONE = "resnet50"              #使用resnet101，若是出现现存溢出情况，则可以修改为50

    COMPUTE_BACKBONE_SHAPE = None

    # The strides of each layer of the FPN Pyramid. These values
    # are based on a Resnet101 backbone.
    BACKBONE_STRIDES = [4, 8, 16, 32, 64]

    # Size of the fully-connected layers in the classification graph
    FPN_CLASSIF_FC_LAYERS_SIZE = 1024

    # Size of the top-down layers used to build the feature pyramid
    TOP_DOWN_PYRAMID_SIZE = 256

    # Number of classification classes (including background)
    NUM_CLASSES = 1  # Override in sub-classes

    # Length of square anchor side in pixels
    RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)

    # Ratios of anchors at each cell (width/height)
    # A value of 1 represents a square anchor, and 0.5 is a wide anchor
    RPN_ANCHOR_RATIOS = [0.5, 1, 2]

    # If 2, then anchors are created for every other cell, and so on.
    RPN_ANCHOR_STRIDE = 1

    # Non-max suppression threshold to filter RPN proposals.
    # You can increase this during training to generate more propsals.
    RPN_NMS_THRESHOLD = 0.7

    # How many anchors per image to use for RPN training
    RPN_TRAIN_ANCHORS_PER_IMAGE = 256
    
    # ROIs kept after tf.nn.top_k and before non-maximum suppression
    PRE_NMS_LIMIT = 6000

    # ROIs kept after non-maximum suppression (training and inference)
    POST_NMS_ROIS_TRAINING = 2000
    POST_NMS_ROIS_INFERENCE = 1000

    # If enabled, resizes instance masks to a smaller size to reduce
    # memory load. Recommended when using high-resolution images.
    USE_MINI_MASK = True
    MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask

    # Input image resizing
    IMAGE_RESIZE_MODE = "square"
    IMAGE_MIN_DIM = 800                    #图片的最大最小值，根据自己的训练集大小来定
    IMAGE_MAX_DIM = 1024
    # Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further
    IMAGE_MIN_SCALE = 0
    # Number of color channels per image. RGB = 3, grayscale = 1, RGB-D = 4
    IMAGE_CHANNEL_COUNT = 3

    # Image mean (RGB)
    MEAN_PIXEL = np.array([123.7, 116.8, 103.9])

    # the RPN NMS threshold.
    TRAIN_ROIS_PER_IMAGE = 200

    # Percent of positive ROIs used to train classifier/mask heads
    ROI_POSITIVE_RATIO = 0.33

    # Pooled ROIs
    POOL_SIZE = 7
    MASK_POOL_SIZE = 14

    # Shape of output mask
    # To change this you also need to change the neural network mask branch
    MASK_SHAPE = [28, 28]

    # Maximum number of ground truth instances to use in one image
    MAX_GT_INSTANCES = 100

    # Bounding box refinement standard deviation for RPN and final detections.
    RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
    BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])

    # Max number of final detections
    DETECTION_MAX_INSTANCES = 100

    # ROIs below this threshold are skipped
    DETECTION_MIN_CONFIDENCE = 0.7

    # Non-maximum suppression threshold for detection
    DETECTION_NMS_THRESHOLD = 0.3

    # implementation.
    LEARNING_RATE =