钢铁缺陷检测mask rcnn版本

最新推荐文章于 2025-09-16 10:13:18 发布

原创

最新推荐文章于 2025-09-16 10:13:18 发布 · 6k 阅读

72 ·

CC 4.0 BY-SA版权

本文详述了使用tensorflow实现Mask RCNN进行钢铁缺陷检测的全过程，包括数据预处理、训练过程中的参数调整（如RPN_ANCHOR_SCALES）、自定义优化器、损失加权和dropout等，最终模型达到86%的准确率。

一、项目主框架代码（tensorflow版本的Mask RCNN）

Mask RCNN论文及代码解析参考我的另一篇博客：https://blog.youkuaiyun.com/qq_32172681/article/details/99761084

Mask RCNN keras实现代码，大神的github地址：https://github.com/matterport/Mask_RCNN

运行环境：tensorflow_gpu 1.14.0，CUDA版本是10.0，cudnn版本号7.4.1，python3.6，tensorflow+keras实现。

二、训练过程

1、数据预处理/标签预处理

数据预处理：

去均值

数据增强imgaug

共12000+张图像，其中10000张为训练集，其他为验证集

标签预处理：

（1）标签格式

训练集的标签采用RLE编码，RLE编码见我的另一篇博客：https://blog.youkuaiyun.com/qq_32172681/article/details/100537042

如下图所示：ImageId_ClassId为image_name+"_"+class_id，EncodedPixels为图片的RLE编码，一共有4类缺陷，因此每4行数据表示一个图片的标签。

（2）将EncodedPixels转换为mask和bbox

1个EncodedPixels得到1个mask，它的size为图像大小[256,1600]，1表示mask，0表示background
1个mask得到多个bbox：(x1,y1,x2,y2)

此部分代码参考自kaggle public kernel：https://www.kaggle.com/applefish/get-bboxes-from-segmentation-labels

"""将rle编码转换为mask数组（size=[256,1600]，1表示mask，0表示background）"""
def rle_decode(mask_rle, shape=(768, 768)):
    """
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return
    Returns numpy array, 1 - mask, 0 - background
    """
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape((shape[1], shape[0])).T  # Needed to align to RLE direction

""""""
def masks_as_image(in_mask_list, all_masks=None, shape=(256, 1600)):
    # Take the individual masks and create a single mask array
    if all_masks is None:
        all_masks = np.zeros(shape, dtype=np.int16)
    # if isinstance(in_mask_list, list):
    for mask in in_mask_list:
        if isinstance(mask, str):
            all_masks += rle_decode(mask, shape)
    return np.expand_dims(all_masks, -1)


"""从一个rle编码的mask，获取所有的bbox"""
def get_bboxes_from_rle(encoded_pixels, return_mask=False):
    """get all bboxes from a whole mask label"""
    """将rle编码转换为mask（size=[256,1600]，1表示mask，0表示background）"""
    mask = masks_as_image([encoded_pixels])
    lbl = label(mask)

    props = regionprops(lbl)

    # get bboxes by a for loop

最低0.47元/天解锁文章