多边形标注收缩python代码实现

最新推荐文章于 2025-11-02 20:13:32 发布

原创最新推荐文章于 2025-11-02 20:13:32 发布 · 1.5k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#多边形 #收缩

Python相关专栏收录该内容

29 篇文章

订阅专栏

本文介绍了一种在文本检测任务中改善上下文粘连问题的方法——标注收缩。通过使用特定算法对原始标注进行收缩，可以有效减少文本之间的粘连，提高检测精度。文章详细描述了如何利用Polygon和Pyclipper库实现这一过程，并提供了代码示例。

部署运行你感兴趣的模型镜像

1. 概述

在做文本检测相关工作的时候会使用到分割网络对文本进行检测，直接使用原始的标注进行训练会导致上下文本存在粘连，一个较为有效的办法就是对标注进行收缩，这样粘连的情况会有所好转。这里将图片放置在src_imgs文件夹下（文件格式为jpg文件），对应的标注放置在labels_txt文件夹下（文件格式为txt文件）。

对于标注文件中标注的规范为：

x1 y1 x2 y2 ... xn yn label_type

每个值以空格作为分隔，最后一个为标注的类别。

2. 代码实现

# -*- coding=utf-8 -*-
import os
import cv2
import Polygon as plg
import pyclipper
import numpy as np

def dist(a, b):
    return np.sqrt(np.sum((a - b) ** 2))

def perimeter(bbox):
    peri = 0.0
    for i in range(bbox.shape[0]):
        peri += dist(bbox[i], bbox[(i + 1) % bbox.shape[0]])
    return peri


def shrink(bboxes, rate, max_shr=20):
    rate = rate * rate
    shrinked_bboxes = []
    for bbox in bboxes:
        area = plg.Polygon(bbox).area()
        peri = perimeter(bbox)

        pco = pyclipper.PyclipperOffset()
        pco.AddPath(bbox, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
        offset = min((int)(area * (1 - rate) / (peri + 0.001) + 0.5), max_shr)

        shrinked_bbox = pco.Execute(-offset)
        if len(shrinked_bbox) == 0:
            shrinked_bboxes.append(bbox)
            continue

        shrinked_bbox = np.array(shrinked_bbox)[0]
        if shrinked_bbox.shape[0] <= 2:
            shrinked_bboxes.append(bbox)
            continue

        shrinked_bboxes.append(shrinked_bbox)

    return shrinked_bboxes  # np.array(shrinked_bboxes)


def demo(root_path):
    gt_dir = os.path.join(root_path, 'labels_txt')
    img_dir = os.path.join(root_path, 'src_imgs')

    label_list = os.listdir(gt_dir)
    for label_file in label_list[:10]:
        src_img = cv2.imread(os.path.join(img_dir, label_file[:-4]+".jpg"))
        with open(os.path.join(gt_dir, label_file), 'r') as f:
            label_lines = f.readlines()
        gt_boxes = []
        for line in label_lines:
            line = line.strip()
            box_points = [int(float(item)) for item in line.split( )[:-1]]
            box_info = np.array(box_points).reshape((-1, 2))
            gt_boxes.append(box_info)
        [H, W, C] = src_img.shape
        ori_mask_img = np.zeros((H, W))
        for box in gt_boxes:
            ori_mask_img = cv2.fillPoly(ori_mask_img, [box], (255))

        shrink_mask_img = np.zeros((H, W))
        new_gt_boxes = shrink(gt_boxes, 0.9)
        for box in new_gt_boxes:
            shrink_mask_img = cv2.fillPoly(shrink_mask_img, [box], (255))

        cv2.imshow("ori_mask", ori_mask_img)
        cv2.imshow("shrink mask", shrink_mask_img)
        cv2.waitKey(0)

if __name__ == "__main__":
    root_path = os.getcwd()
    demo(root_path)

您可能感兴趣的与本文相关的镜像

AutoGPT

AI应用

AutoGPT于2023年3月30日由游戏公司Significant Gravitas Ltd.的创始人Toran Bruce Richards发布,AutoGPT是一个AI agent（智能体），也是开源的应用程序，结合了GPT-4和GPT-3.5技术，给定自然语言的目标，它将尝试通过将其分解成子任务，并在自动循环中使用互联网和其他工具来实现这一目标