visdrone2019转化为coco格式数据集
coco数据集的格式
这个应该不用说了,对于久经CV的老玩家来说,已经再熟悉不过了。
visdrone2019(DET)
标签含义
- 边界框左上角的x坐标
- 边界框左上角的y坐标
- 边界框的宽度
- 边界框的高度
- DETECTION文件中的分数表示包围对象实例的预测边界框的置信度。 GROUNDTRUTH文件中的分数设置为1或0。1表示在计算中考虑边界框,而0表示将忽略边界框。
- 忽略区域(0)、行人(1)、人(2)、自行车(3)、汽车(4)、面包车(5)、卡车(6)、三轮车(7)、雨篷三轮车(8)、公共汽车(9)、摩托车(10),其他(11)
- DETECTION文件中的分数应设置为常数-1。 GROUNDTRUTH文件中的得分表示对象部分出现在帧外的程度(即,无截断=0(截断比率0%),部分截断=1(截断比率1%°´50%))。
- DETECTION文件中的分数应设置为常数-1。 GROUNDTRUTH文件中的分数表示被遮挡的对象的分数(即,无遮挡=0(遮挡比率0%),部分遮挡=1(遮挡比率1%°´50%),重度遮挡=2(遮挡率50%~100%))。
<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>
Name Description
-------------------------------------------------------------------------------------------------------------------------------
<bbox_left> The x coordinate of the top-left corner of the predicted bounding box
<bbox_top> The y coordinate of the top-left corner of the predicted object bounding box
<bbox_width> The width in pixels of the predicted object bounding box
<bbox_height> The height in pixels of the predicted object bounding box
<score> The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing
an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation,
while 0 indicates the bounding box will be ignored.
<object_category> The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1),
people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),
others(11))
<truncation> The score in the DETECTION result file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame
(i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
<occlusion> The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0
(occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2
(occlusion ratio 50% ~ 100%)).
注:两种有用的注释:truncation截断率,occlusion遮挡率。被遮挡的对象比例来定义遮挡率。截断率用于指示对象部分出现在框架外部的程度。值得一提的是,如果目标的截断率大于50%,则会在评估过程中将其跳过。
转换代码
import os
import cv2
from tqdm import tqdm
import json
def test():
dir=r'D:\pythonProjects\Test\visdrone2coco'
train_dir = os.path.join(dir, "annotations")
print(train_dir)
id_num = 0
categories = [
{"id": 0, "name": "ignored regions"},
{"id": 1, "name": "pedestrian"},
{"id": 2, "name": "people"},
{"id": 3, "name": "bicycle"},
{"id": 4, "name": "car"},
{"id": 5, "name": "van"},
{"id": 6, "name": "truck"},
{"id": 7, "name": "tricycle"},
{"id": 8, "name": "awning-tricycle"},
{"id": 9, "name": "bus"},
{"id": 10, "name": "motor"},
{"id": 11, "name": "others"}
]
images = []
annotations = []
set = os.listdir('./annotations')
annotations_path = './annotations'
images_path = './images'
print()
for i in tqdm(set):
print(annotations_path + "/" + i, "r")
f = open(annotations_path + "/" + i, "r")
name = i.replace(".txt", "")
image = {}
height, width = cv2.imread(images_path + "/" + name + ".jpg").shape[:2]
file_name = name + ".jpg"
image["file_name"] = file_name
image["height"] = height
image["width"] = width
image["id"] = name
images.append(image)
for line in f.readlines():
annotation = {}
line = line.replace("\n", "")
if line.endswith(","): # filter data
line = line.rstrip(",")
line_list = [int(i) for i in line.split(",")]
bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
annotation["image_id"] = name
annotation["score"] = line_list[4]
annotation["bbox"] = bbox_xywh
annotation["category_id"] = int(line_list[5])
annotation["id"] = id_num
annotation["iscrowd"] = 0
annotation["segmentation"] = []
annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
id_num += 1
annotations.append(annotation)
dataset_dict = {}
dataset_dict["images"] = images
dataset_dict["annotations"] = annotations
dataset_dict["categories"] = categories
json_str = json.dumps(dataset_dict)
with open(f'./output.json', 'w') as json_file:
json_file.write(json_str)
print("json file write done...")
if __name__ == '__main__':
test()
visdrone2019(VID)
标签含义
- 视频帧的帧索引
- 提供时间对应不同帧中边界框的关系
- 边界框左上角的x坐标
- 边界框左上角的y坐标
- 边界框的宽度
- 边界框的高度
- DETECTION文件中的分数表示包围对象实例的预测边界框的置信度。
GROUNDTRUTH文件中的分数设置为1或0。1表示在计算中考虑边界框,而0表示将忽略边界框。 - 忽略区域(0)、行人(1)、人(2)、自行车(3)、汽车(4)、面包车(5)、卡车(6)、三轮车(7)、雨篷三轮车(8)、公共汽车(9)、摩托车(10),其他(11)
- DETECTION文件中的分数应设置为常数-1。
GROUNDTRUTH文件中的得分表示对象部分出现在帧外的程度(即,无截断=0(截断比率0%),部分截断=1(截断比率1%°´50%))。 - DETECTION文件中的分数应设置为常数-1。
GROUNDTRUTH文件中的分数表示被遮挡的对象的分数(即,无遮挡=0(遮挡比率0%),部分遮挡=1(遮挡比率1%°´50%),重度遮挡=2(遮挡率50%~100%))。
<frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>
Name Description
----------------------------------------------------------------------------------------------------------------------------------
<frame_index> The frame index of the video frame
<target_id> In the DETECTION result file, the identity of the target should be set to the constant -1.
In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding relation of the bounding boxes in different frames.
<bbox_left> The x coordinate of the top-left corner of the predicted bounding box
<bbox_top> The y coordinate of the top-left corner of the predicted object bounding box
<bbox_width> The width in pixels of the predicted object bounding box
<bbox_height> The height in pixels of the predicted object bounding box
<score> The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.
<object_category> The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11))
<truncation> The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).
<occlusion> The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).
注:两种有用的注释:truncation截断率,occlusion遮挡率。被遮挡的对象比例来定义遮挡率。截断率用于指示对象部分出现在框架外部的程度。值得一提的是,如果目标的截断率大于50%,则会在评估过程中将其跳过。
操作数据集
通过观察我们不难发现,visdrone-DET 的数据集格式为一张图片对应一张txt。由于每个视频具有多张图片(每一帧为一张图片),所以在txt中,应该按照frame_index将相同frame_index的数据整理成一个txt,并且命名为0000XXX。
目标:一张图片对应一个txt文件,然后利用DET转换的代码,对VID进行coco格式数据集转换
- 将annotations中的文件进行帧数扩展
- 将sequences中的图片进行重命名,然后复制到images文件夹中
其中txt的内容是: 去除 <frame_index> <target_id> 两个标签的其余8个标签。
转换代码
点击即可下载:https://download.youkuaiyun.com/download/qq_44824148/86814694?spm=1001.2014.3001.5501
# 复制文件
def copyfile(old_file_path,new_folder_path):
shutil.copy(old_file_path, new_folder_path)
# 转换
......
# 重命名
......
import os
import cv2
from tqdm import tqdm
import json
def test():
# 需要修改dir路径,其子文件夹需要有annotations和images
dir='/usr/ldw/visdrone2coco/'
train_dir = os.path.join(dir, "annotations")
print(train_dir)
id_num = 0
categories = [
{"id": 0, "name": "ignored regions"},
{"id": 1, "name": "pedestrian"},
{"id": 2, "name": "people"},
{"id": 3, "name": "bicycle"},
{"id": 4, "name": "car"},
{"id": 5, "name": "van"},
{"id": 6, "name": "truck"},
{"id": 7, "name": "tricycle"},
{"id": 8, "name": "awning-tricycle"},
{"id": 9, "name": "bus"},
{"id": 10, "name": "motor"},
{"id": 11, "name": "others"}
]
images = []
annotations = []
# 需要修改annotations_path,指向annotations
# annotations_path = r'J:\Dataset\visdrone\Task 2_ Object Detection in Videos\VisDrone2019-VID-train\annotations'
annotations_path='/usr/ldw/visdrone2coco/annotations/'
set = os.listdir(annotations_path)
# images_path,指向images
# images_path = r'J:\Dataset\visdrone\Task 2_ Object Detection in Videos\VisDrone2019-VID-train\images'
images_path='/usr/ldw/visdrone2coco/images/'
print()
for i in tqdm(set):
print(annotations_path + "/" + i, "r")
f = open(annotations_path + "/" + i, "r")
name = i.replace(".txt", "")
image = {}
height, width = cv2.imread(images_path + "/" + name + ".jpg").shape[:2]
file_name = name + ".jpg"
image["file_name"] = file_name
image["height"] = height
image["width"] = width
image["id"] = name
images.append(image)
for line in f.readlines():
annotation = {}
line = line.replace("\n", "")
if line.endswith(","): # filter data
line = line.rstrip(",")
line_list = [int(i) for i in line.split(",")]
bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
annotation["image_id"] = name
annotation["score"] = line_list[4]
annotation["bbox"] = bbox_xywh
annotation["category_id"] = int(line_list[5])
annotation["id"] = id_num
annotation["iscrowd"] = 0
annotation["segmentation"] = []
annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
id_num += 1
annotations.append(annotation)
dataset_dict = {}
dataset_dict["images"] = images
dataset_dict["annotations"] = annotations
dataset_dict["categories"] = categories
json_str = json.dumps(dataset_dict)
# 修改url,后缀名为json
url='/usr/ldw/visdrone2coco/annotations/a1.json'
with open(url, 'w') as json_file:
json_file.write(json_str)
print("json file write done...")
if __name__ == '__main__':
test()