visdrone2019转化为coco格式数据集（包含DET和VID）

莫余

已于 2022-10-27 12:29:03 修改

阅读量5.3k

点赞数 10

分类专栏：计算机视觉文章标签： 1024程序员节 coco visdrone 数据格式转换

于 2022-10-24 19:51:53 首次发布

本文链接：https://blog.youkuaiyun.com/qq_44824148/article/details/127270393

版权

计算机视觉专栏收录该内容

43 篇文章

订阅专栏

文章目录

visdrone2019转化为coco格式数据集

visdrone2019转化为coco格式数据集

coco数据集的格式

这个应该不用说了，对于久经CV的老玩家来说，已经再熟悉不过了。

visdrone2019（DET）

标签含义

边界框左上角的x坐标
边界框左上角的y坐标
边界框的宽度
边界框的高度
DETECTION文件中的分数表示包围对象实例的预测边界框的置信度。 GROUNDTRUTH文件中的分数设置为1或0。1表示在计算中考虑边界框，而0表示将忽略边界框。
忽略区域（0）、行人（1）、人（2）、自行车（3）、汽车（4）、面包车（5）、卡车（6）、三轮车（7）、雨篷三轮车（8）、公共汽车（9）、摩托车（10），其他（11）
DETECTION文件中的分数应设置为常数-1。 GROUNDTRUTH文件中的得分表示对象部分出现在帧外的程度（即，无截断=0（截断比率0%），部分截断=1（截断比率1%°´50%））。
DETECTION文件中的分数应设置为常数-1。 GROUNDTRUTH文件中的分数表示被遮挡的对象的分数（即，无遮挡=0（遮挡比率0%），部分遮挡=1（遮挡比率1%°´50%），重度遮挡=2（遮挡率50%~100%））。

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

注：两种有用的注释：truncation截断率,occlusion遮挡率。被遮挡的对象比例来定义遮挡率。截断率用于指示对象部分出现在框架外部的程度。值得一提的是，如果目标的截断率大于50％，则会在评估过程中将其跳过。

转换代码

import os
import cv2
from tqdm import tqdm
import json


def test():
    dir=r'D:\pythonProjects\Test\visdrone2coco'
    train_dir = os.path.join(dir, "annotations")
    print(train_dir)
    id_num = 0
    categories = [
        {"id": 0, "name": "ignored regions"},
        {"id": 1, "name": "pedestrian"},
        {"id": 2, "name": "people"},
        {"id": 3, "name": "bicycle"},
        {"id": 4, "name": "car"},
        {"id": 5, "name": "van"},
        {"id": 6, "name": "truck"},
        {"id": 7, "name": "tricycle"},
        {"id": 8, "name": "awning-tricycle"},
        {"id": 9, "name": "bus"},
        {"id": 10, "name": "motor"},
        {"id": 11, "name": "others"}
    ]
    images = []
    annotations = []
    set = os.listdir('./annotations')
    annotations_path = './annotations'
    images_path = './images'
    print()
    for i in tqdm(set):
        print(annotations_path + "/" + i, "r")
        f = open(annotations_path + "/" + i, "r")
        name = i.replace(".txt", "")
        image = {}
        height, width = cv2.imread(images_path + "/" + name + ".jpg").shape[:2]
        file_name = name + ".jpg"
        image["file_name"] = file_name
        image["height"] = height
        image["width"] = width
        image["id"] = name
        images.append(image)
        for line in f.readlines():
            annotation = {}
            line = line.replace("\n", "")
            if line.endswith(","):  # filter data
                line = line.rstrip(",")
            line_list = [int(i) for i in line.split(",")]
            bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
            annotation["image_id"] = name
            annotation["score"] = line_list[4]
            annotation["bbox"] = bbox_xywh
            annotation["category_id"] = int(line_list[5])
            annotation["id"] = id_num
            annotation["iscrowd"] = 0
            annotation["segmentation"] = []
            annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
            id_num += 1
            annotations.append(annotation)
        dataset_dict = {}
        dataset_dict["images"] = images
        dataset_dict["annotations"] = annotations
        dataset_dict["categories"] = categories
        json_str = json.dumps(dataset_dict)
        with open(f'./output.json', 'w') as json_file:
            json_file.write(json_str)
    print("json file write done...")

if __name__ == '__main__':
    test()

visdrone2019（VID）

标签含义

视频帧的帧索引
提供时间对应不同帧中边界框的关系
边界框左上角的x坐标
边界框左上角的y坐标
边界框的宽度
边界框的高度
DETECTION文件中的分数表示包围对象实例的预测边界框的置信度。
GROUNDTRUTH文件中的分数设置为1或0。1表示在计算中考虑边界框，而0表示将忽略边界框。
忽略区域（0）、行人（1）、人（2）、自行车（3）、汽车（4）、面包车（5）、卡车（6）、三轮车（7）、雨篷三轮车（8）、公共汽车（9）、摩托车（10），其他（11）
DETECTION文件中的分数应设置为常数-1。
GROUNDTRUTH文件中的得分表示对象部分出现在帧外的程度（即，无截断=0（截断比率0%），部分截断=1（截断比率1%°´50%））。
DETECTION文件中的分数应设置为常数-1。
GROUNDTRUTH文件中的分数表示被遮挡的对象的分数（即，无遮挡=0（遮挡比率0%），部分遮挡=1（遮挡比率1%°´50%），重度遮挡=2（遮挡率50%~100%））。

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

操作数据集

通过观察我们不难发现，visdrone-DET 的数据集格式为一张图片对应一张txt。由于每个视频具有多张图片（每一帧为一张图片），所以在txt中，应该按照frame_index将相同frame_index的数据整理成一个txt，并且命名为0000XXX。
目标：一张图片对应一个txt文件，然后利用DET转换的代码，对VID进行coco格式数据集转换

将annotations中的文件进行帧数扩展
将sequences中的图片进行重命名，然后复制到images文件夹中

其中txt的内容是：去除 <frame_index> <target_id> 两个标签的其余8个标签。

转换代码

点击即可下载：https://download.youkuaiyun.com/download/qq_44824148/86814694?spm=1001.2014.3001.5501

#  复制文件
def copyfile(old_file_path,new_folder_path):
    shutil.copy(old_file_path, new_folder_path)

# 转换
......
# 重命名
......

import os
import cv2
from tqdm import tqdm
import json


def test():
    # 需要修改dir路径，其子文件夹需要有annotations和images
    dir='/usr/ldw/visdrone2coco/'
    train_dir = os.path.join(dir, "annotations")
    print(train_dir)
    id_num = 0
    categories = [
        {"id": 0, "name": "ignored regions"},
        {"id": 1, "name": "pedestrian"},
        {"id": 2, "name": "people"},
        {"id": 3, "name": "bicycle"},
        {"id": 4, "name": "car"},
        {"id": 5, "name": "van"},
        {"id": 6, "name": "truck"},
        {"id": 7, "name": "tricycle"},
        {"id": 8, "name": "awning-tricycle"},
        {"id": 9, "name": "bus"},
        {"id": 10, "name": "motor"},
        {"id": 11, "name": "others"}
    ]
    images = []
    annotations = []
    # 需要修改annotations_path,指向annotations
    # annotations_path = r'J:\Dataset\visdrone\Task 2_ Object Detection in Videos\VisDrone2019-VID-train\annotations'
    annotations_path='/usr/ldw/visdrone2coco/annotations/'
    set = os.listdir(annotations_path)
    # images_path,指向images
    # images_path = r'J:\Dataset\visdrone\Task 2_ Object Detection in Videos\VisDrone2019-VID-train\images'
    images_path='/usr/ldw/visdrone2coco/images/'
    print()
    for i in tqdm(set):
        print(annotations_path + "/" + i, "r")
        f = open(annotations_path + "/" + i, "r")
        name = i.replace(".txt", "")
        image = {}
        height, width = cv2.imread(images_path + "/" + name + ".jpg").shape[:2]
        file_name = name + ".jpg"
        image["file_name"] = file_name
        image["height"] = height
        image["width"] = width
        image["id"] = name
        images.append(image)
        for line in f.readlines():
            annotation = {}
            line = line.replace("\n", "")
            if line.endswith(","):  # filter data
                line = line.rstrip(",")
            line_list = [int(i) for i in line.split(",")]
            bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
            annotation["image_id"] = name
            annotation["score"] = line_list[4]
            annotation["bbox"] = bbox_xywh
            annotation["category_id"] = int(line_list[5])
            annotation["id"] = id_num
            annotation["iscrowd"] = 0
            annotation["segmentation"] = []
            annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
            id_num += 1
            annotations.append(annotation)
        dataset_dict = {}
        dataset_dict["images"] = images
        dataset_dict["annotations"] = annotations
        dataset_dict["categories"] = categories
        json_str = json.dumps(dataset_dict)
        # 修改url，后缀名为json
        url='/usr/ldw/visdrone2coco/annotations/a1.json'
        with open(url, 'w') as json_file:
            json_file.write(json_str)
    print("json file write done...")

if __name__ == '__main__':
    test()