【深度学习】COCO数据集格式和PASCAL VOC数据格式-优快云博客

本文链接：https://blog.youkuaiyun.com/Orthrus19/article/details/129801097

本文介绍了COCO数据集和PASCAL VOC数据集的标注格式，包括COCO的JSON文件结构、Object Instances、segmentation、categories和keypoints，以及PASCAL VOC的XML标注格式，详细解析了两者在目标检测和图像理解任务中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

数据处理

1 COCO数据集

1.1 标注类型

object instances（目标实例）

object keypoints（目标上的关键点）

image captions（看图说话）

1.2 数据信息格式：JSON文件

1.3 共享结构

3种类型共享这些基本类型：info、image、license，而annotation类型则呈现出了多态

在不同的JSON文件中这三个类型是一样的，定义是共享的。不共享的是annotation和category这两种结构体，他们在不同类型的JSON文件中是不一样的。

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}
    
info{
    "year": int,
    "version": str,
    "description": str,
    "contributor": str,
    "url": str,
    "date_created": datetime,
}
license{
    "id": int,
    "name": str,
    "url": str,
} 
image{
    "id": int,
    "width": int,
    "height": int,
    "file_name": str,
    "license": int,
    "flickr_url": str,
    "coco_url": str,
    "date_captured": datetime,
}

1.4 Object Instances（目标实例）

Object Instance这种格式的文件从头至尾按照顺序分为以下段落：

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

images数组元素的数量等同于划入训练集（或者测试集）的图片的数量；

annotations数组元素的数量等同于训练集（或者测试集）中bounding box的数量；

categories数组元素的数量为80（2017年）；

1.4.1 annotation（标注）

基本格式

annotations字段是包含多个annotation实例的一个数组。

annotation{
    "id": int,    
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}

annotation类型本身又包含了一系列的字段，如这个目标的category id（种类id）和segmentation mask。

（1）segmentation格式取决于这个实例是一个单个的对象.

iscrowd=0那么segmentation就是polygon格式；只要iscrowd=1那么segmentation就是RLE格式。注意，单个的对象（iscrowd=0)可能需要多个polygon来表示，比如这个对象在图像中被挡住了。而iscrowd=1时（将标注一组对象，