基于PPYOLOE+的水下生物目标检测

原创已于 2022-10-29 12:40:40 修改 · 置顶 · 3.5k 阅读

46 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #计算机视觉 #paddlepaddle

于 2022-10-29 09:48:38 首次发布

深度学习同时被 3 个专栏收录

15 篇文章

订阅专栏

人工智能

14 篇文章

订阅专栏

python

7 篇文章

订阅专栏

部署运行你感兴趣的模型镜像

基于PP-YOLOE+的水下生物目标检测+部署

项目链接【https://aistudio.baidu.com/aistudio/projectdetail/4647849?contributionType=1】

1 项目背景

水下目标检测旨在对水下场景中的物体进行定位和识别。这项研究由于在海洋学、水下导航等领域的广泛应用而引起了持续的关注。但是，由于复杂的水下环境和光照条件，这仍然是一项艰巨的任务。

基于深度学习的物体检测系统已在各种应用中表现出较好的性能，但在处理水下目标检测方面仍然感到不足，主要有原因是：可用的水下目标检测数据集稀少，实际应用中的水下场景的图像杂乱无章，并且水下环境中的目标物体通常很小，而当前基于深度学习的目标检测器通常无法有效地检测小物体，或者对小目标物体的检测性能较差。同时，在水下场景中，与波长有关的吸收和散射问题大大降低了水下图像的质量，从而导致了可见度损失，弱对比度和颜色变化等问题。

Al+水下勘探是一个新兴领域，目前专门用于水下研究工作的解决方案不多，高质量的数据集更是弥足珍贵。使用 PP-YOLOE+ 来推进水下目标检测的进步，从而使得水下机器人等设备能够更加智能化，提高海底资源勘探等方面的效率。而水下机器人又反哺出高质量的水下目标检测数据集，推动 Al+水下勘探的发展。

2 方案选择

2.1 问题与挑战

深度学习中，数据往往决定了性能的上限，算法只是不断地逼近上限。尽管基于深度学习的方法在标准的目标检测中取得了可喜的性能。水下目标检测仍具有以下几点挑战：

（1）水下场景的实际应用中目标通常很小，含有大量的小目标；

（2）水下数据集和实际应用中的图像通常是模糊的，图像中具有异构的噪声。

2.2 方案选择

在这里插入图片描述

因此，针对以上所述的背景和水中目标检测所遇到的挑战，本项目将选用 PP-YOLOE+ 这一基于飞桨云边一体高精度模型PP-YOLOE迭代优化升级的版本。针对性的解决以上问题。

2.3 模型特点介绍

PP-YOLOE+ 具有如下特点：

超强性能
训练收敛加速
下游任务泛化性显著提升
高性能部署能力

3 环境配置

3.1 环境准备

PaddlePaddle >= 2.3.2
Python == 3.7

3.2 环境安装

%cd /home/aistudio/work/

# gitee 国内下载比较快
!git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b develop

# github
# !git clone https://github.com/PaddlePaddle/PaddleDetection.git -b develop

# 环境安装
%cd /home/aistudio/work/

# gitee 国内下载比较快
# !git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b develop

%cd PaddleDetection/
!pip install -r requirements.txt > /dev/null
/home/aistudio/work
/home/aistudio/work/PaddleDetection

4 数据集预处理

4.1 数据集介绍

本项目选用数据集来源于水下目标检测算法赛（注：数据由鹏城实验室提供），训练集是5543张 jpg 格式的水下光学图像与对应标注结果构成，其中主要有海参、海胆、扇贝、海星四种目标。
在这里插入图片描述

4.1.1 标签类别

supercategory	id	name
component	1	echinus
component	2	holothurian
component	3	scallop
component	4	starfish
component	5	waterweeds

4.1.2 图像分辨率

长度	宽度	图片数量
704	576	38
1920	1080	596
3840	2160	1712
720	405	3153
586	Text	44

4.2 数据集处理

4.2.1 数据集解压

# 1.数据集解压
!unzip data/data172711/fish.zip > /dev/null
!mv ./fish ./work/PaddleDetection/dataset

4.2.2 voc2coco

%cd work/PaddleDetection/
/home/aistudio/work/PaddleDetection

# 2.voc2coco
import argparse
import glob
import json
import os
import os.path as osp
import sys
import shutil

import numpy as np
import PIL.ImageDraw
import xml.dom.minidom as xmldom
import cv2
label_to_num = {}
categories_list = []
labels_list = []


class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        else:
            return super(MyEncoder, self).default(obj)


def getbbox(self, points):
    polygons = points
    mask = self.polygons_to_mask([self.height, self.width], polygons)
    return self.mask2box(mask)


def images_labelme(data, num):
    image = {}
    image['height'] = data['imageHeight']
    image['width'] = data['imageWidth']
    image['id'] = num + 1
    image['file_name'] = data['imagePath'].split('/')[-1]
    return image


def images_cityscape(num, img_file, w, h):
    image = {}
    image['height'] = h
    image['width'] = w
    image['id'] = num + 1
    image['file_name'] = img_file
    return image


def categories(label, labels_list):
    category = {}
    category['supercategory'] = 'component'
    category['id'] = len(labels_list) + 1
    category['name'] = label
    return category


def annotations_rectangle(points, label, image_num, object_num, label_to_num):
    annotation = {}
    seg_points = np.asarray(points).copy()
    annotation['segmentation'] = [list(seg_points.flatten())]
    annotation['iscrowd'] = 0
    annotation['image_id'] = image_num + 1
    annotation['bbox'] = list(
        map(float, [
            points[0][0], points[0][1], points[1][0] - points[0][0], points[1][
                1] - points[0][1]
        ]))
    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
    annotation['category_id'] = label_to_num[label]
    annotation['id'] = object_num + 1
    return annotation


def annotations_polygon(height, width, points, label, image_num, object_num,
                        label_to_num):
    annotation = {}
    annotation['segmentation'] = [list(np.asarray(points).flatten())]
    annotation['iscrowd'] = 0
    annotation['image_id'] = image_num + 1
    annotation['bbox'] = list(map(float, get_bbox(height, width, points)))
    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
    annotation['category_id'] = label_to_num[label]
    annotation['id'] = object_num + 1
    return annotation


def get_bbox(height, width, points):
    polygons = points
    mask = np.zeros([height, width], dtype=np.uint8)
    mask = PIL.Image.fromarray(mask)
    xy = list(map(tuple, polygons))
    PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
    mask = np.array(mask, dtype=bool)
    index = np.argwhere(mask == 1)
    rows = index[:, 0]
    clos = index[:, 1]
    left_top_r = np.min(rows)
    left_top_c = np.min(clos)
    right_bottom_r = np.max(rows)
    right_bottom_c = np.max(clos)
    return [
        left_top_c, left_top_r, right_bottom_c - left_top_c,
        right_bottom_r - left_top_r
    ]


def deal_json(output_dir, input_dir, img_list):
    data_coco = {}
    images_list = []
    annotations_list = []
    image_num = -1
    object_num = -1
    # labels_list =[]
    
    for img_file in img_list:
        img = cv2.imread(osp.join('dataset/fish/Images', img_file))
        w = img.shape[1]
        h = img.shape[0]
        img_label = img_file.split('.')[0]
        label_file = osp.join('dataset/fish/Annotations', img_label + '.xml')
        # print('Generating dataset from:', label_file)
        image_num = image_num + 1

        xml_file = xmldom.parse(label_file)
        eles = xml_file.documentElement
        images_list.append(images_cityscape(image_num, img_file, w, h))
            
        for i in range(len(eles.getElementsByTagName('name'))):
            label = eles.getElementsByTagName('name')[i].firstChild.data
            # if label == 'starfish':
            #     continue
            object_num = object_num + 1
            # print(label)
            if label not in labels_list:
                # print(label)
                categories_list.append(categories(label, labels_list))
                labels_list.append(label)
                label_to_num[label] = len(labels_list)
            points = []
            xmin = int(eles.getElementsByTagName('xmin')[i].firstChild.data)
            ymin = int(eles.getElementsByTagName('ymin')[i].firstChild.data)
            xmax = int(eles.getElementsByTagName('xmax')[i].firstChild.data)
            ymax = int(eles.getElementsByTagName('ymax')[i].firstChild.data)
            # print(xmin,ymin,xmax,ymax)
            # if xmin > 2000:
            #     print(label_file)
            points.append([xmin, ymin])
            points.append([xmax, ymax])
            annotations_list.append(
                annotations_rectangle(points, label, image_num,
                                    object_num, label_to_num))                
    
    data_coco['images'] = images_list
    data_coco['categories'] = categories_list
    data_coco['annotations'] = annotations_list
    # print(labels_list)
    return data_coco


import os.path as osp
import glob
import os
import shutil
train_img = 'dataset/fish/Images'
train_box = 'dataset/fish/Annotations'

# Allocate the dataset.
total_num = len(glob.glob(osp.join(train_img, '*.jpg')))
    
train_num = int(total_num * 0.8)
os.makedirs('data/cocome' + '/train')

val_num = total_num - train_num
os.makedirs('data/cocome' + '/val')

count = 1
train_list = []
val_list = []
for img_name in os.listdir(train_img):
    if count <= train_num:
        if osp.exists('data/cocome' + '/train/'):
            shutil.copyfile(
                osp.join(train_img, img_name),
                osp.join('data/cocome' + '/train/', img_name))
            train_list.append(img_name)
    else:
        if count <= train_num + val_num:
            if osp.exists('data/cocome' + '/val/'):
                shutil.copyfile(
                    osp.join(train_img, img_name),
                    osp.join('data/cocome' + '/val/', img_name))
            val_list.append(img_name)
    count = count + 1

if not os.path.exists('data/cocome' + '/annotations'):
    os.makedirs('data/cocome' + '/annotations')
train_data_coco = deal_json(
        'data/cocome' + '/train', train_img ,train_list)
train_json_path = osp.join('data/cocome' + '/annotations', 'instance_train.json')
json.dump(
        train_data_coco,
        open(train_json_path, 'w'),
        indent=4,
        cls=MyEncoder)
            
val_data_coco = deal_json('data/cocome' + '/val', train_img, val_list)
val_json_path = osp.join('data/cocome' + '/annotations', 'instance_val.json')
json.dump(val_data_coco, open(val_json_path, 'w'), indent=4, cls=MyEncoder)

4.2.3 将转换后的数据集移动至dataset/fish

# 3.将转换后的数据集移动至dataset/fish
!mv data/cocome/* dataset/fish/

4.2.4 将自己的数据集的路径进行修改和配置

metric: COCO
num_classes: 5

TrainDataset:
  !COCODataSet
    image_dir: train
    anno_path: annotations/instance_train.json
    dataset_dir: dataset/fish
    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']

EvalDataset:
  !COCODataSet
    image_dir: val
    anno_path: annotations/instance_val.json
    dataset_dir: dataset/fish

TestDataset:
  !ImageFolder
    anno_path: label_list.txt # also support txt (like VOC's label_list.txt)
    dataset_dir: dataset/fish # if set, anno_path will be 'dataset_dir/anno_path'

5 模型训练

使用 PP-YOLOE+ 进行训练:

在./configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml中提供了基于PP-YOLOE+训练该场景的配置，训练脚本如下：

!python tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o LearningRate.base_lr=0.000125 -o snapshot_epoch=1 -o worker_num=1 --eval

6 模型评估

在训练模型以后，我们可以通过运行评估命令来得到模型的精度，以确认训练的效果。评估可以参考以下命令执行。

这里使用了我们已经训练好的模型。如希望使用自己训练的模型，请对应将weights=后的值更改为对应模型.pdparams文件的存储路径。

# 模型评估
!python tools/eval.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams

Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
W1016 22:54:41.591436 22455 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1016 22:54:41.598824 22455 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[10/16 22:54:43] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams
[10/16 22:54:44] ppdet.engine INFO: Eval iter: 0
[10/16 22:54:55] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[10/16 22:54:55] ppdet.metrics.coco_utils INFO: Start evaluate...
Loading and preparing results...
DONE (t=0.50s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=3.09s).
Accumulating evaluation results...
DONE (t=0.43s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.541
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.267
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.275
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.160
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.436
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.543
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.514
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.572
[10/16 22:54:59] ppdet.engine INFO: Total sample number: 200, averge FPS: 19.422192298181745

7 模型预测

这里我们将训练好的模型对着整个数据集的验证集来一番批量预测，这些预测结果会记录在VisualDL中，可以很方便地与原图对比，观察预测效果。

!python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model --infer_dir ./dataset/fish/val --use_vdl=True --vdl_log_dir=./output/image --output_dir ./output/results

8 模型导出

.pdparams只包括了模型的参数数据，实际部署还需要执行导出步骤。导出步骤可以参考下面列举的步骤：

注意，这里使用了我们已经训练好的模型。如希望使用自己训练的模型，请对应将weights=后的值更改为对应模型.pdparams文件的存储路径。如果没有指定–output_dir，那么导出的模型将默认存储在 output_inference/ 路径下。

!python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams

至此，我们就完成了水底生物目标检测模型的从训练到导出的过程。接下来，看看该模型使用Paddle Inference部署时的具体性能表现。

9 模型速度测试

!python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_s_80e_coco --image_file=dataset/fish/val/c000418.jpg --run_mode=paddle --device=gpu

-----------  Running Arguments -----------
action_file: None
batch_size: 1
camera_id: -1
combine_method: nms
cpu_threads: 1
device: gpu
enable_mkldnn: False
enable_mkldnn_bfloat16: False
image_dir: None
image_file: dataset/fish/val/c000418.jpg
match_metric: ios
match_threshold: 0.6
model_dir: output_inference/ppyoloe_plus_crn_s_80e_coco
output_dir: output
overlap_ratio: [0.25, 0.25]
random_pad: False
reid_batch_size: 50
reid_model_dir: None
run_benchmark: False
run_mode: paddle
save_images: True
save_mot_txt_per_img: False
save_mot_txts: False
save_results: False
scaled: False
slice_infer: False
slice_size: [640, 640]
threshold: 0.5
tracker_config: None
trt_calib_mode: False
trt_max_shape: 1280
trt_min_shape: 1
trt_opt_shape: 640
use_coco_category: False
use_dark: True
use_gpu: False
video_file: None
window_size: 50
------------------------------------------
-----------  Model Configuration -----------
Model Arch: YOLO
Transform Order: 
--transform op: Resize
--transform op: NormalizeImage
--transform op: Permute
--------------------------------------------
class_id:0, confidence:0.7396, left_top:[299.31,0.93],right_bottom:[331.32,32.59]
class_id:0, confidence:0.7280, left_top:[293.39,100.21],right_bottom:[332.52,138.38]
class_id:0, confidence:0.6212, left_top:[528.09,-1.20],right_bottom:[566.73,20.30]
class_id:0, confidence:0.5986, left_top:[71.87,-0.63],right_bottom:[111.20,26.91]
class_id:0, confidence:0.5766, left_top:[120.02,101.65],right_bottom:[181.98,152.03]
class_id:0, confidence:0.5554, left_top:[405.12,215.59],right_bottom:[458.40,272.60]
class_id:0, confidence:0.5486, left_top:[476.20,49.54],right_bottom:[511.81,85.39]
class_id:1, confidence:0.6370, left_top:[446.48,281.86],right_bottom:[504.50,342.79]
class_id:1, confidence:0.5986, left_top:[450.84,89.30],right_bottom:[494.58,123.99]
class_id:1, confidence:0.5715, left_top:[510.54,89.15],right_bottom:[570.02,138.44]
class_id:1, confidence:0.5151, left_top:[512.79,94.07],right_bottom:[562.67,133.47]
save result to: output/c000418.jpg
Test iter 0
------------------ Inference Time Info ----------------------
total_time(ms): 1072.8, img_num: 1
average latency time(ms): 1072.80, QPS: 0.932140
preprocess_time(ms): 18.20, inference_time(ms): 1054.50, postprocess_time(ms): 0.10

在这里插入图片描述