json_to_dataset 详解(写成注释的形式）（添加了可以过滤掉无效json文件的代码）

原创

已于 2023-09-28 19:11:49 修改 · 726 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#json #python #windows #pytorch #深度学习

于 2023-09-28 10:20:29 首次发布

文章介绍了如何使用Python脚本过滤出JSON文件中包含少于3个点的无效多边形，同时处理LabelMe生成的8位彩色标签图，确保数据集质量。

一、包含过滤掉无用json文件的代码：

def filter_json_files(json_dir):
    """
    Filter out JSON files that contain polygons with less than 3 points.

    Args:
        json_dir (str): Path to the directory containing the JSON files.

    Returns:
        tuple: A tuple containing two lists:
            - valid_json_files (list): A list of valid JSON file paths.
            - invalid_json_files (list): A list of invalid JSON file paths.
    """
    valid_json_files = []
    invalid_json_files = []
    for filename in os.listdir(json_dir):
        if not filename.endswith('.json'):
            continue
        json_path = os.path.join(json_dir, filename)
        with open(json_path, 'r') as f:
            data = json.load(f)
            shapes = data['shapes']
            is_valid = True
            for shape in shapes:
                points = shape['points']
                if len(points) < 3:
                    is_valid = False
                    break
            if is_valid:
                valid_json_files.append(json_path)
            else:
                invalid_json_files.append(json_path)
    return valid_json_files, invalid_json_files

二、全部代码：

import base64  # base64模块提供了大量函数用来把二进制数据编码为可打印的ASCII字符，以及将其解码为二进制数据
import json  # 主要用于将python对象编码为json格式输出或存储，以及将json格式对象解码为python对象。
import os  # os就是“operating system”的缩写，顾名思义，os模块提供的就是各种 Python 程序与操作系统进行交互的接口。
import os.path as osp

import numpy as np
import PIL.Image
from labelme import utils

'''
制作自己的语义分割数据集需要注意以下几点：
1、我使用的labelme版本是3.16.7，建议使用该版本的labelme，有些版本的labelme会发生错误，
   具体错误为：Too many dimensions: 3 > 2
   安装方式为命令行pip

最低0.47元/天解锁文章