训练自己的实例分割模型

最新推荐文章于 2025-05-21 14:12:40 发布

山水之间2018

最新推荐文章于 2025-05-21 14:12:40 发布

阅读量3.4k

点赞数

CC 4.0 BY-SA版权

分类专栏：图像处理目标检测/跟踪

本文链接：https://blog.youkuaiyun.com/Gavinmiaoc/article/details/102681016

这篇博客介绍了如何利用YOLACT模型进行实时实例分割。作者提供了模型的高速性能和训练过程，包括数据集的准备、转换为COCO格式、安装步骤以及评估模型的方法。还分享了自定义数据集的制作流程，使用labelme工具进行标注，并给出了训练指令。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

注：2019年04月05日刚出炉的paper

Abstract：我们提出了一个用于实时实例分割的简单全卷积模型，在单个Titan Xp上以33 fps在MS COCO上实现了29.8 mAP，这比以前的任何算法都要快得多。此外，我们只在一个GPU上训练后获得此结果。我们通过将实例分割分成两个并行子任务：（1）生成一组原型掩膜（prototype mask）；（2）预测每个实例的掩膜系数（mask coefficients）。然后我们通过将原型与掩模系数线性组合来生成实例掩膜（instance masks）。我们发现因为这个过程不依赖于 repooling，所以这种方法可以产生非常高质量的掩模。此外，我们分析了 the emergent behavior of our prototypes，并表明他们学会以 translation variant manner 定位实例，尽管是完全卷积的。最后，我们还提出了快速NMS（Fast NMS），比标准NMS快12 ms，只有一点点性能损失。

paper：yolact

github:yolact

效果还可以。

下面是实现过程：

Installation

Set up a Python3 environment.
Install Pytorch 1.0.1 (or higher) and TorchVision.

Install some other packages:

# Cython needs to be installed before pycocotools
pip install cython
pip install opencv-python pillow pycocotools matplotlib

Clone this repository and enter it:

git clone https://github.com/dbolya/yolact.git
cd yolact

If you'd like to train YOLACT, download the COCO dataset and the 2014/2017 annotations. Note that this script will take a while and dump 21gb of files into ./data/coco.
```
sh data/scripts/COCO.sh
```
If you'd like to evaluate YOLACT on test-dev, download test-dev with this script.
```
sh data/scripts/COCO_test.sh
```

安装参考源码给的步骤没问题。数据集下载按照coco.sh里下载，我是自行抠出下载地址然后使用wget下载的。

这里放一个百度网盘下载链接供大家使用，提取码: 2tw3

Evaluation

As of April 5th, 2019 here are our latest models along with their FPS on a Titan Xp and mAP on test-dev:

Image Size	Backbone	FPS	mAP	Weights
550	Resnet50-FPN	42.5	28.2	yolact_resnet50_54_800000.pth	Mirror
550	Darknet53-FPN	40.0	28.7	yolact_darknet53_54_800000.pth	Mirror
550	Resnet101-FPN	33.0	29.8	yolact_base_54_800000.pth	Mirror
700	Resnet101-FPN	23.6	31.2	yolact_im700_54_800000.pth	Mirror

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands.

在整个数据集做评估：

python3 eval.py --trained_model=weights/yolact_base_54_800000.pth  --dataset=coco2017_dataset

单张图片进行测试：

python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png

其他用法汇总如下：

Quantitative Results on COCO
# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.
# This should get 29.92 validation mask mAP last time I checked.
python3 eval.py --trained_model=weights/yolact_base_54_800000.pth  --dataset=coco2017_dataset

# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.
# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.
python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json

# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.
python run_coco_eval.py

# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and go
python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset


# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.3.
python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --dataset=coco2017_dataset --score_threshold=0.3 --top_k=100 --display



Benchmarking on COCO
# Run just the raw model on the first 1k images of the validation set
python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --max_images=1000



Images
# Display qualitative results on the specified image.
python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png

# Process an image and save it to another file.
python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png

# Process a whole folder of images.
python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder

eg:python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --images=data/test:data/results


# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=0

# Process a video and save it to another file. This is unoptimized.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4



## training
# Trains using the base config with a batch size of 8 (the default).
python train.py --config=yolact_base_config

# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.
python3 train.py --config=yolact_base_config --batch_size=5

# Resume training yolact_base with a specific weight fil

最低0.47元/天解锁文章