YOLO-World数据格式转换：COCO、VOC与LVIS标注格式互转工具-优快云博客

YOLO-World数据格式转换：COCO、VOC与LVIS标注格式互转工具

【免费下载链接】YOLO-World 项目地址: https://gitcode.com/gh_mirrors/yo/YOLO-World

数据标注格式痛点解析

在计算机视觉（Computer Vision）领域，数据标注格式的不统一一直是算法工程师和研究员面临的主要挑战之一。不同的目标检测框架（如YOLO、Faster R-CNN、Mask R-CNN）和数据集（如COCO、VOC、LVIS）往往采用各自独特的标注格式，这导致在模型训练、评估和迁移过程中需要频繁进行格式转换。根据行业调研，数据预处理阶段约占整个深度学习项目周期的60%，其中格式转换占预处理时间的40%以上。

常见标注格式痛点：

COCO（Common Objects in Context）：JSON格式，支持目标检测、分割、关键点等任务，但结构复杂，包含info、licenses、images、annotations、categories等多个顶级字段。
VOC（Visual Object Classes）：XML格式，每个图像对应一个XML文件，标注信息分散，不适合大规模数据处理。
LVIS（Large Vocabulary Instance Segmentation）：基于COCO扩展，支持1203个类别，引入了频率（frequency）和普遍性（普遍性）等属性，标注文件体积庞大。

YOLO-World作为新一代实时目标检测框架，原生支持多种数据格式，但在实际应用中，用户仍需将自有数据转换为框架兼容的格式。本文将详细介绍如何使用YOLO-World提供的工具和API实现COCO、VOC与LVIS格式的高效互转，并提供实用的代码示例和最佳实践。

核心数据格式对比

1. 格式结构对比

特性	COCO	VOC	LVIS
文件格式	单JSON文件	多XML文件	单JSON文件
支持任务	检测、分割、关键点	检测	检测、分割
类别数量	80	20	1203
标注粒度	实例级	实例级	实例级+属性
数据体积	中等	大（文件数量多）	大（字段复杂）
兼容性	广泛支持	传统框架支持	部分框架支持

2. 核心字段对比

COCO格式示例：

{
  "images": [
    {"id": 1, "width": 640, "height": 480, "file_name": "image1.jpg"}
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 5,
      "bbox": [100, 200, 150, 250],
      "area": 37500,
      "iscrowd": 0
    }
  ],
  "categories": [{"id": 5, "name": "airplane"}]
}

VOC格式示例（XML）：

<annotation>
  <filename>image1.jpg</filename>
  <size>
    <width>640</width>
    <height>480</height>
  </size>
  <object>
    <name>airplane</name>
    <bndbox>
      <xmin>100</xmin>
      <ymin>200</ymin>
      <xmax>250</xmax>
      <ymax>450</ymax>
    </bndbox>
  </object>
</annotation>

LVIS格式示例：

{
  "images": [
    {"id": 1, "width": 640, "height": 480, "file_name": "image1.jpg"}
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 5,
      "bbox": [100, 200, 150, 250],
      "area": 37500,
      "iscrowd": 0,
      "frequency": "f",
      "普遍性": "r"
    }
  ],
  "categories": [
    {
      "id": 5,
      "name": "airplane",
      "frequency": "f",
      "synonyms": ["aeroplane", "plane"]
    }
  ]
}

YOLO-World数据加载API解析

YOLO-World通过yolo_world/datasets/目录下的模块提供了对COCO、VOC和LVIS格式的原生支持。核心类包括YOLOv5LVISV1Dataset、V3DetDataset等，这些类继承自MMDetection的基础数据集类，并针对YOLO-World的特性进行了优化。

1. COCO格式支持

在yolo_world/datasets/yolov5_v3det.py中，V3DetDataset类封装了COCO格式的加载逻辑：

from mmdet.datasets.api_wrappers import COCO

class V3DetDataset(CocoDataset):
    """Objects365 v1 dataset for detection."""
    COCOAPI = COCO
    ANN_ID_UNIQUE = True

    def load_data_list(self) -> List[dict]:
        with get_local_path(self.ann_file, backend_args=self.backend_args) as local_path:
            self.coco = self.COCOAPI(local_path)
        # 加载图像和标注数据
        img_ids = self.coco.get_img_ids()
        data_list = []
        for img_id in img_ids:
            raw_img_info = self.coco.load_imgs([img_id])[0]
            ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
            raw_ann_info = self.coco.load_anns(ann_ids)
            # 解析和处理数据
            parsed_data_info = self.parse_data_info({
                'raw_ann_info': raw_ann_info,
                'raw_img_info': raw_img_info
            })
            data_list.append(parsed_data_info)
        return data_list

2. LVIS格式支持

yolo_world/datasets/yolov5_lvis.py中定义了YOLOv5LVISV1Dataset类，专门用于加载LVIS格式数据：

from mmdet.datasets import LVISV1Dataset

@DATASETS.register_module()
class YOLOv5LVISV1Dataset(BatchShapePolicyDataset, LVISV1Dataset):
    """Dataset for YOLOv5 LVIS Dataset.
    We only add `BatchShapePolicy` function compared with Objects365V1Dataset.
    See `mmyolo/datasets/utils.py#BatchShapePolicy` for details
    """
    pass

该类继承自MMDetection的LVISV1Dataset，并融合了YOLOv5的BatchShapePolicy，支持动态批次形状调整，提高训练效率。

3. VOC格式支持

VOC格式的支持通过yolo_world/datasets/yolov5_v3det.py中的YOLOv5V3DetDataset实现：

@DATASETS.register_module()
class YOLOv5V3DetDataset(BatchShapePolicyDataset, V3DetDataset):
    """Dataset for YOLOv5 VOC Dataset.
    We only add `BatchShapePolicy` function compared with Objects365V1Dataset.
    See `mmyolo/datasets/utils.py#BatchShapePolicy` for details
    """
    pass

格式转换工具与实践

工具概述

YOLO-World虽然没有提供独立的格式转换脚本，但可以通过其数据集API和MMDetection的工具函数实现格式互转。核心思路是：

使用源格式的加载器读取数据
将数据转换为中间表示（如字典列表）
使用目标格式的保存器写入数据

以下是几种常见转换场景的实现方法。

场景1：VOC转COCO

VOC格式通常包含多个XML文件，转换为COCO格式需要将这些文件合并为一个JSON文件。

import os
import json
import xml.etree.ElementTree as ET
from tqdm import tqdm

def voc_to_coco(voc_img_dir, voc_ann_dir, output_json_path, classes):
    """
    将VOC格式转换为COCO格式
    Args:
        voc_img_dir: VOC图像目录
        voc_ann_dir: VOC标注目录（XML文件）
        output_json_path: 输出COCO JSON文件路径
        classes: 类别列表
    """
    # 初始化COCO数据结构
    coco_data = {
        "info": {},
        "licenses": [],
        "images": [],
        "annotations": [],
        "categories": []
    }
    
    # 添加类别信息
    for idx, cls in enumerate(classes):
        coco_data["categories"].append({
            "id": idx + 1,  # COCO类别ID从1开始
            "name": cls,
            "supercategory": "none"
        })
    
    img_id = 1
    ann_id = 1
    
    # 遍历所有XML文件
    for ann_file in tqdm(os.listdir(voc_ann_dir)):
        if not ann_file.endswith(".xml"):
            continue
        
        # 解析XML文件
        tree = ET.parse(os.path.join(voc_ann_dir, ann_file))
        root = tree.getroot()
        
        # 获取图像信息
        filename = root.find("filename").text
        width = int(root.find("size/width").text)
        height = int(root.find("size/height").text)
        
        # 添加图像信息到COCO数据
        coco_data["images"].append({
            "id": img_id,
            "file_name": filename,
            "width": width,
            "height": height
        })
        
        # 处理每个目标
        for obj in root.iter("object"):
            cls_name = obj.find("name").text
            if cls_name not in classes:
                continue
            cls_id = classes.index(cls_name) + 1  # 类别ID从1开始
            
            # 获取边界框
            bbox = obj.find("bndbox")
            xmin = float(bbox.find("xmin").text)
            ymin = float(bbox.find("ymin").text)
            xmax = float(bbox.find("xmax").text)
            ymax = float(bbox.find("ymax").text)
            
            # COCO格式的bbox为[x, y, width, height]
            bbox_coco = [xmin, ymin, xmax - xmin, ymax - ymin]
            area = bbox_coco[2] * bbox_coco[3]
            
            # 添加标注信息
            coco_data["annotations"].append({
                "id": ann_id,
                "image_id": img_id,
                "category_id": cls_id,
                "bbox": bbox_coco,
                "area": area,
                "iscrowd": 0,
                "segmentation": []
            })
            
            ann_id += 1
        
        img_id += 1
    
    # 保存COCO JSON文件
    with open(output_json_path, "w") as f:
        json.dump(coco_data, f, indent=2)

# 使用示例
CLASSES = ["aeroplane", "bicycle", "bird", "boat", "bottle", 
           "bus", "car", "cat", "chair", "cow", 
           "diningtable", "dog", "horse", "motorbike", "person", 
           "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

voc_to_coco(
    voc_img_dir="data/VOC2007/JPEGImages",
    voc_ann_dir="data/VOC2007/Annotations",
    output_json_path="data/voc2coco_train.json",
    classes=CLASSES
)

场景2：COCO转LVIS

LVIS格式在COCO基础上增加了类别频率和属性信息。以下是转换示例：

import json

def coco_to_lvis(coco_json_path, lvis_json_path, lvis_categories_path):
    """
    将COCO格式转换为LVIS格式
    Args:
        coco_json_path: COCO JSON文件路径
        lvis_json_path: 输出LVIS JSON文件路径
        lvis_categories_path: LVIS类别定义文件路径（包含frequency等信息）
    """
    # 加载COCO数据
    with open(coco_json_path, "r") as f:
        coco_data = json.load(f)
    
    # 加载LVIS类别信息
    with open(lvis_categories_path, "r") as f:
        lvis_categories = json.load(f)
    
    # 构建COCO类别到LVIS类别的映射
    coco_cat_id_to_name = {cat["id"]: cat["name"] for cat in coco_data["categories"]}
    lvis_name_to_cat = {cat["name"]: cat for cat in lvis_categories}
    
    # 初始化LVIS数据结构
    lvis_data = {
        "info": coco_data.get("info", {}),
        "licenses": coco_data.get("licenses", []),
        "images": coco_data["images"],
        "annotations": [],
        "categories": lvis_categories
    }
    
    # 处理标注
    for ann in coco_data["annotations"]:
        coco_cat_id = ann["category_id"]
        coco_cat_name = coco_cat_id_to_name.get(coco_cat_id)
        
        if not coco_cat_name or coco_cat_name not in lvis_name_to_cat:
            continue
        
        lvis_cat = lvis_name_to_cat[coco_cat_name]
        
        # 添加LVIS特有字段
        lvis_ann = ann.copy()
        lvis_ann["category_id"] = lvis_cat["id"]
        lvis_ann["frequency"] = lvis_cat.get("frequency", "f")
        lvis_ann["instance_count"] = 1  # 示例值，实际应根据数据计算
        
        lvis_data["annotations"].append(lvis_ann)
    
    # 保存LVIS JSON文件
    with open(lvis_json_path, "w") as f:
        json.dump(lvis_data, f, indent=2)

# 使用示例
coco_to_lvis(
    coco_json_path="data/coco/annotations/instances_train2017.json",
    lvis_json_path="data/lvis/lvis_v1_train.json",
    lvis_categories_path="data/lvis/lvis_v1_categories.json"
)

场景3：LVIS转COCO

LVIS到COCO的转换主要是移除LVIS特有字段，并可能需要合并或过滤类别：

import json

def lvis_to_coco(lvis_json_path, coco_json_path, coco_categories_path=None):
    """
    将LVIS格式转换为COCO格式
    Args:
        lvis_json_path: LVIS JSON文件路径
        coco_json_path: 输出COCO JSON文件路径
        coco_categories_path: COCO类别定义文件路径（可选，用于过滤类别）
    """
    # 加载LVIS数据
    with open(lvis_json_path, "r") as f:
        lvis_data = json.load(f)
    
    # 初始化COCO数据结构
    coco_data = {
        "info": lvis_data.get("info", {}),
        "licenses": lvis_data.get("licenses", []),
        "images": lvis_data["images"],
        "annotations": [],
        "categories": []
    }
    
    # 处理类别
    if coco_categories_path:
        with open(coco_categories_path, "r") as f:
            coco_categories = json.load(f)
        coco_name_to_id = {cat["name"]: cat["id"] for cat in coco_categories}
        coco_data["categories"] = coco_categories
    else:
        # 使用LVIS类别，但移除LVIS特有字段
        coco_categories = []
        for cat in lvis_data["categories"]:
            coco_cat = {
                "id": cat["id"],
                "name": cat["name"],
                "supercategory": cat.get("supercategory", "none")
            }
            coco_categories.append(coco_cat)
        coco_name_to_id = {cat["name"]: cat["id"] for cat in coco_categories}
        coco_data["categories"] = coco_categories
    
    # 处理标注
    for ann in lvis_data["annotations"]:
        # 移除LVIS特有字段
        lvis_specific_fields = ["frequency", "instance_count", "attributes"]
        coco_ann = {k: v for k, v in ann.items() if k not in lvis_specific_fields}
        
        # 如果提供了COCO类别文件，过滤不在COCO中的类别
        if coco_categories_path:
            cat_id = coco_ann["category_id"]
            cat_name = next((cat["name"] for cat in lvis_data["categories"] if cat["id"] == cat_id), None)
            if not cat_name or cat_name not in coco_name_to_id:
                continue
            coco_ann["category_id"] = coco_name_to_id[cat_name]
        
        coco_data["annotations"].append(coco_ann)
    
    # 保存COCO JSON文件
    with open(coco_json_path, "w") as f:
        json.dump(coco_data, f, indent=2)

# 使用示例
lvis_to_coco(
    lvis_json_path="data/lvis/lvis_v1_val.json",
    coco_json_path="data/coco/annotations/lvis2coco_val.json",
    coco_categories_path="data/coco/annotations/instances_val2017.json"
)

高级应用：自定义数据集加载器

对于特殊格式的数据，用户可以通过继承YOLO-World的数据集类实现自定义加载逻辑。以下是一个示例：

from yolo_world.datasets.yolov5_v3det import V3DetDataset
from mmyolo.registry import DATASETS

@DATASETS.register_module()
class CustomDataset(V3DetDataset):
    """自定义数据集加载器"""
    
    def load_data_list(self) -> List[dict]:
        # 1. 加载自定义格式数据
        custom_data = self._load_custom_data()
        
        # 2. 转换为YOLO-World兼容格式
        data_list = []
        for item in custom_data:
            data_info = {
                "img_id": item["id"],
                "img_path": item["file_name"],
                "height": item["height"],
                "width": item["width"],
                "instances": []
            }
            
            for ann in item["annotations"]:
                data_info["instances"].append({
                    "bbox": ann["bbox"],  # [x, y, w, h]
                    "label": ann["category_id"],
                    "mask": ann.get("mask", None)
                })
            
            data_list.append(data_info)
        
        return data_list
    
    def _load_custom_data(self):
        """加载自定义格式数据的实现"""
        with open(self.ann_file, "r") as f:
            return json.load(f)

然后在配置文件中指定使用自定义数据集：

train_dataset = dict(
    type='CustomDataset',
    data_root='data/custom/',
    ann_file='train.json',
    data_prefix=dict(img='images/'),
    filter_cfg=dict(filter_empty_gt=True, min_size=32),
    pipeline=train_pipeline
)

性能优化与最佳实践

1. 大型数据集处理

对于超过10万张图像的大型数据集，建议采用以下优化措施：

分块处理：将大JSON文件拆分为多个小文件，逐步转换
多线程加载：使用concurrent.futures并行处理文件IO
内存映射：对于XML文件，使用lxml库的迭代解析功能，避免一次性加载全部数据

示例代码（多线程VOC转COCO）：

from concurrent.futures import ThreadPoolExecutor, as_completed

def process_xml(ann_file, voc_ann_dir, classes, img_id_counter, ann_id_counter):
    """处理单个XML文件"""
    # 实现与前面voc_to_coco函数中的XML解析逻辑相同
    # ...
    return img_data, anns_data

def voc_to_coco_multithread(voc_img_dir, voc_ann_dir, output_json_path, classes, max_workers=4):
    # 初始化COCO数据结构
    coco_data = {
        "info": {},
        "licenses": [],
        "images": [],
        "annotations": [],
        "categories": [{"id": i+1, "name": cls} for i, cls in enumerate(classes)]
    }
    
    # 获取所有XML文件
    ann_files = [f for f in os.listdir(voc_ann_dir) if f.endswith(".xml")]
    
    # 使用线程池处理
    img_id = 1
    ann_id = 1
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = []
        for ann_file in ann_files:
            futures.append(executor.submit(
                process_xml, ann_file, voc_ann_dir, classes, img_id, ann_id
            ))
            # 更新ID计数器（实际实现中需要线程安全的计数器）
            img_id += 1
            ann_id += 10  # 预估每个图像最多10个标注
        
        # 收集结果
        for future in as_completed(futures):
            img_data, anns_data = future.result()
            if img_data:
                coco_data["images"].append(img_data)
            coco_data["annotations"].extend(anns_data)
    
    # 保存结果
    with open(output_json_path, "w") as f:
        json.dump(coco_data, f, indent=2)

2. 数据校验与清洗

转换前后进行数据校验是保证训练效果的关键。建议检查以下内容：

图像路径正确性：确保所有标注中的图像路径存在
边界框有效性：检查bbox坐标是否在图像范围内，宽高是否为正数
类别一致性：确保标注中的类别ID与类别列表对应
分割掩码完整性：对于分割任务，检查掩码格式是否正确

YOLO-World提供了数据校验工具：

python tools/analysis_tools/analyze_dataset.py \
    configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py \
    --out-dir analysis_results

3. 增量转换

对于持续更新的数据集，增量转换可以节省时间：

def incremental_convert(last_convert_time):
    """只转换上次转换后新增的数据"""
    new_files = []
    for file in os.listdir(voc_ann_dir):
        if file.endswith(".xml"):
            modify_time = os.path.getmtime(os.path.join(voc_ann_dir, file))
            if modify_time > last_convert_time:
                new_files.append(file)
    
    # 只处理新增文件
    # ...

常见问题与解决方案

Q1: 转换大型COCO文件时内存不足怎么办？

A1: 使用ijson库进行流式解析，避免一次性加载整个JSON文件：

import ijson

def stream_load_coco(coco_json_path):
    with open(coco_json_path, "r") as f:
        parser = ijson.parse(f)
        current_key = None
        current_obj = None
        for prefix, event, value in parser:
            if prefix == "categories.item":
                if event == "start_map":
                    current_obj = {}
                elif event == "end_map":
                    yield ("category", current_obj)
                    current_obj = None
                else:
                    key = prefix.split(".")[-1]
                    current_obj[key] = value
            # 类似处理images和annotations...

Q2: VOC转COCO后类别ID不连续，会影响训练吗？

A2: YOLO-World支持非连续类别ID，但建议在转换时重新映射为连续ID，以提高计算效率。可使用以下代码：

def remap_category_ids(coco_data):
    """将类别ID重新映射为连续整数"""
    cat_ids = sorted({cat["id"] for cat in coco_data["categories"]})
    id_map = {old_id: new_id + 1 for new_id, old_id in enumerate(cat_ids)}  # 从1开始
    
    # 更新类别
    new_categories = []
    for cat in coco_data["categories"]:
        new_cat = cat.copy()
        new_cat["id"] = id_map[cat["id"]]
        new_categories.append(new_cat)
    coco_data["categories"] = new_categories
    
    # 更新标注
    for ann in coco_data["annotations"]:
        ann["category_id"] = id_map[ann["category_id"]]
    
    return coco_data

Q3: 如何将LVIS的罕见类别合并到COCO的常见类别中？

A3: 可以通过类别映射文件手动指定合并规则：

{
    "lvis_to_coco": {
        "tiger_cat": "cat",
        "Persian_cat": "cat",
        "airliner": "airplane",
        "biplane": "airplane"
    }
}

然后在转换时应用映射：

with open("category_mapping.json", "r") as f:
    mapping = json.load(f)["lvis_to_coco"]

# 在标注处理中使用映射
lvis_cat_name = lvis_cat["name"]
coco_cat_name = mapping.get(lvis_cat_name, lvis_cat_name)
if coco_cat_name not in coco_name_to_id:
    continue

总结与展望

本文详细介绍了YOLO-World框架下COCO、VOC与LVIS数据格式的互转方法，包括核心API解析、实用转换脚本和性能优化技巧。通过掌握这些工具，用户可以高效地将自有数据集成到YOLO-World的训练流程中，充分发挥框架的性能优势。

未来，YOLO-World团队计划推出更强大的数据转换工具，包括：

可视化转换工具：提供Web界面，支持拖放文件和实时预览
多格式批量转换：一次处理多个数据集，自动识别格式
智能标注补全：利用预训练模型自动补全缺失的标注信息

掌握数据格式转换是深度学习项目的基础技能，希望本文能帮助读者扫清数据预处理障碍，将更多精力投入到模型调优和创新应用中。

读完本文，你应该能够：

理解COCO、VOC和LVIS格式的核心差异
使用YOLO-World API加载不同格式的数据集
编写自定义转换脚本实现格式互转
优化大型数据集的转换性能
解决常见的数据格式问题

如果你在使用过程中遇到问题，欢迎在项目GitHub仓库提交issue，或参与社区讨论获取帮助。

点赞 + 收藏 + 关注，获取更多YOLO-World高级教程和实战技巧！下期预告：《YOLO-World模型压缩与部署全攻略》。

【免费下载链接】YOLO-World 项目地址: https://gitcode.com/gh_mirrors/yo/YOLO-World

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考