COCO 格式

最新推荐文章于 2025-02-21 23:15:12 发布

hi我是大嘴巴

最新推荐文章于 2025-02-21 23:15:12 发布

阅读量3.6k

点赞数

本文链接：https://blog.youkuaiyun.com/weixin_38740463/article/details/109182204

版权

本文详细介绍了COCO数据集的标注格式，包括Object Instance、Object Keypoint和Image Caption三种类型，涉及JSON结构体、annotation、categories等关键元素，以及它们在不同标注任务中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

COCO的全称是Common Objects in COntext，是微软团队提供的一个可以用来进行图像识别的数据集。MS COCO数据集中的图像分为训练、验证和测试集。COCO通过在Flickr上搜索80个对象类别和各种场景类型来收集图像，其使用了亚马逊的Mechanical Turk（AMT）。

比如标注image captions（看图说话）这种类型的步骤如下：

（AMT上COCO标注步骤）

COCO通过大量使用Amazon Mechanical Turk来收集数据。COCO数据集现在有3种标注类型：object instances（目标实例）, object keypoints（目标上的关键点）, 和image captions（看图说话），使用JSON文件存储。比如下面就是Gemfield下载的COCO 2017年训练集中的标注文件：

可以看到其中有上面所述的三种类型，每种类型又包含了训练和验证，所以共6个JSON文件。

基本的JSON结构体类型

object instances（目标实例）、object keypoints（目标上的关键点）、image captions（看图说话）这3种类型共享这些基本类型：info、image、license。

而annotation类型则呈现出了多态：

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
}
    
info{
    "year": int,
    "version": str,
    "description": str,
    "contributor": str,
    "url": str,
    "date_created": datetime,
}
license{
    "id": int,
    "name": str,
    "url": str,
} 
image{
    "id": int,
    "width": int,
    "height": int,
    "file_name": str,
    "license": int,
    "flickr_url": str,
    "coco_url": str,
    "date_captured": datetime,
}

1，info类型，比如一个info类型的实例：

"info":{
	"description":"This is stable 1.0 version of the 2014 MS COCO dataset.",
	"url":"http:\/\/mscoco.org",
	"version":"1.0","year":2014,
	"contributor":"Microsoft COCO group",
	"date_created":"2015-01-27 09:11:52.357475"
},

2，Images是包含多个image实例的数组，对于一个image类型的实例：

{
	"license":3,
	"file_name":"COCO_val2014_000000391895.jpg",
	"coco_url":"http:\/\/mscoco.org\/images\/391895",
	"height":360,"width":640,"date_captured":"2013-11-14 11:18:45",
	"flickr_url":"http:\/\/farm9.staticflickr.com\/8186\/8119368305_4e622c8349_z.jpg",
	"id":391895
},

3，licenses是包含多个license实例的数组，对于一个license类型的实例&#