【实习】键盘实例分割

是Winky啊

已于 2025-02-22 16:37:00 修改

阅读量727

点赞数 17

分类专栏： # 计算机视觉文章标签：计算机外设

于 2025-02-09 13:08:07 首次发布

本文链接：https://blog.youkuaiyun.com/Winkyyyyyy/article/details/145143899

版权

计算机视觉专栏收录该内容

6 篇文章

订阅专栏

第一步键盘检测

方案一 canny边缘检测

canny边缘检测检测结果不稳定，容易因为复杂背景或光线变换检测出其他目标。

如图是用canny边缘检测方法标出的检测出的边缘的四个红点。

参考的是这篇文章OpenCV实战之三 | 基于OpenCV实现图像校正_opencv 图像校正-优快云博客

方案二 Mask-RCNN

论文1703.06870

参考Mask Rcnn目标分割-训练自己数据集-详细步骤_maskrcnn训练自己的数据集-优快云博客

1. 下载代码和配置环境

远程+源代码方案

发布 ·Matterport/Mask_RCNN

点击上传zip文件。

Linux 系统中不同层级的目录结构

根目录 / 下的文件目录。根目录 / 是 Linux 文件系统的最顶层目录，所有其他目录都是它的子目录。

root@autodl-container-9403468337-223581eb:~# cd /
root@autodl-container-9403468337-223581eb:/# ls
bin  boot  dev  etc  home  init  lib  lib32  lib64  libx32  media  mnt  NGC-DL-CONTAINER-LICENSE  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

用户主目录 ~ 下的文件目录。用户主目录是每个用户专属的工作目录，用户可以在其中自由创建、修改和删除文件及文件夹。在这个目录中看到的内容通常与用户的特定操作和环境有关：

root@autodl-container-9403468337-223581eb:/# cd ~
root@autodl-container-9403468337-223581eb:~# ls
autodl-pub  autodl-tmp  Mask_RCNN-master.zip  miniconda3  tf-logs

解压代码包

root@autodl-container-9403468337-223581eb:~# unzip Mask_RCNN-master.zip

之后根据确保requirements.txt安装必要的库。

本地+网传代码方案

(keyboard) C:\Users\吴伊晴>git clone https://github.com/matterport/Mask_RCNN.git
Cloning into 'Mask_RCNN'...
remote: Enumerating objects: 956, done.
remote: Total 956 (delta 0), reused 0 (delta 0), pack-reused 956 (from 1)
Receiving objects: 100% (956/956), 137.67 MiB | 16.28 MiB/s, done.
Resolving deltas: 100% (558/558), done.

(keyboard) C:\Users\吴伊晴>cd Mask_RCNN

(keyboard) C:\Users\吴伊晴\Mask_RCNN>python setup.py install
setup.py:9: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
WARNING:root:Fail load requirements file, so using default ones.
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\dist.py:452: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!

        ********************************************************************************
        Usage of dash-separated 'description-file' will not be supported in future
        versions. Please use the underscore name 'description_file' instead.

        This deprecation is overdue, please update your project and remove deprecated
        calls to avoid build errors in the future.

        See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
        ********************************************************************************

!!
  opt = self.warn_dash_deprecation(opt, section)
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\dist.py:452: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!

        ********************************************************************************
        Usage of dash-separated 'license-file' will not be supported in future
        versions. Please use the underscore name 'license_file' instead.

        This deprecation is overdue, please update your project and remove deprecated
        calls to avoid build errors in the future.

        See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
        ********************************************************************************

!!
  opt = self.warn_dash_deprecation(opt, section)
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\dist.py:452: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!

        ********************************************************************************
        Usage of dash-separated 'requirements-file' will not be supported in future
        versions. Please use the underscore name 'requirements_file' instead.

        This deprecation is overdue, please update your project and remove deprecated
        calls to avoid build errors in the future.

        See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
        ********************************************************************************

!!
  opt = self.warn_dash_deprecation(opt, section)
INFO:root:running install
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\_distutils\cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\_distutils\cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:root:running bdist_egg
INFO:root:running egg_info
INFO:root:creating mask_rcnn.egg-info
INFO:root:writing mask_rcnn.egg-info\PKG-INFO
INFO:root:writing dependency_links to mask_rcnn.egg-info\dependency_links.txt
INFO:root:writing top-level names to mask_rcnn.egg-info\top_level.txt
INFO:root:writing manifest file 'mask_rcnn.egg-info\SOURCES.txt'
INFO:root:reading manifest file 'mask_rcnn.egg-info\SOURCES.txt'
INFO:root:reading manifest template 'MANIFEST.in'
INFO:root:adding license file 'LICENSE'
INFO:root:writing manifest file 'mask_rcnn.egg-info\SOURCES.txt'
INFO:root:installing library code to build\bdist.win-amd64\egg
INFO:root:running install_lib
INFO:root:running build_py
INFO:root:creating build\lib\mrcnn
INFO:root:copying mrcnn\config.py -> build\lib\mrcnn
INFO:root:copying mrcnn\model.py -> build\lib\mrcnn
INFO:root:copying mrcnn\parallel_model.py -> build\lib\mrcnn
INFO:root:copying mrcnn\utils.py -> build\lib\mrcnn
INFO:root:copying mrcnn\visualize.py -> build\lib\mrcnn
INFO:root:copying mrcnn\__init__.py -> build\lib\mrcnn
INFO:root:creating build\bdist.win-amd64\egg
INFO:root:creating build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\config.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\model.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\parallel_model.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\utils.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\visualize.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\__init__.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\config.py to config.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\model.py to model.cpython-38.pyc
build\bdist.win-amd64\egg\mrcnn\model.py:2359: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if os.name is 'nt':
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\parallel_model.py to parallel_model.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\utils.py to utils.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\visualize.py to visualize.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\__init__.py to __init__.cpython-38.pyc
INFO:root:creating build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
WARNING:root:zip_safe flag not set; analyzing archive contents...
INFO:root:creating dist
INFO:root:creating 'dist\mask_rcnn-2.1-py3.8.egg' and adding 'build\bdist.win-amd64\egg' to it
INFO:root:removing 'build\bdist.win-amd64\egg' (and everything under it)
INFO:root:Processing mask_rcnn-2.1-py3.8.egg
INFO:root:Copying mask_rcnn-2.1-py3.8.egg to d:\env\anaconda\envs\keyboard\lib\site-packages
INFO:root:Adding mask-rcnn 2.1 to easy-install.pth file
INFO:root:
Installed d:\env\anaconda\envs\keyboard\lib\site-packages\mask_rcnn-2.1-py3.8.egg
INFO:root:Processing dependencies for mask-rcnn==2.1
INFO:root:Finished processing dependencies for mask-rcnn==2.1

2. 准备数据集

为了训练 Mask R-CNN，我准备了一个包含键盘的标注数据集。我先收集键盘图片，再使用 LabelMe工具来手动标注键盘。（由于云端服务器好像打不开LabelMe，我在本地服务器标注的数据集。）

标注后得到的数据集是这样的。

由于数据集的格式需要与 Mask R-CNN 所要求的格式兼容，所以要将标签转换为coco数据集格式。使用转换代码

import argparse
import base64
import json
import os
import os.path as osp
 
import imgviz
import PIL.Image
 
from labelme.logger import logger
from labelme import utils
 
 
import glob
 
# 最前面加入导包
import yaml
 
 
def main():
    logger.warning(
        "This script is aimed to demonstrate how to convert the "
        "JSON file to a single image dataset."
    )
    logger.warning(
        "It won't handle multiple JSON files to generate a "
        "real-use dataset."
    )
 
    parser = argparse.ArgumentParser()
    ###############################################增加的语句##############################
    # parser.add_argument("json_file")
    parser.add_argument("--json_dir",default="D:/2021file/Biye/Mask_RCNN-master/samples/Mydata")
    ###############################################end###################################
    parser.add_argument("-o", "--out", default=None)
    args = parser.parse_args()
 
    ###############################################增加的语句##############################
    assert args.json_dir is not None and len(args.json_dir) > 0
    # json_file = args.json_file
    json_dir = args.json_dir
 
    if osp.isfile(json_dir):
        json_list = [json_dir] if json_dir.endswith('.json') else []
    else:
        json_list = glob.glob(os.path.join(json_dir, '*.json'))
    ###############################################end###################################
 
    for json_file in json_list:
        json_name = osp.basename(json_file).split('.')[0]
        out_dir = args.out if (args.out is not None) else osp.join(osp.dirname(json_file), json_name)
        ###############################################end###################################
        if not osp.exists(out_dir):
            os.makedirs(out_dir)
 
        data = json.load(open(json_file))
        imageData = data.get("imageData")
 
        if not imageData:
            imagePath = os.path.join(os.path.dirname(json_file), data["imagePath"])
            with open(imagePath, "rb") as f:
                imageData = f.read()
                imageData = base64.b64encode(imageData).decode("utf-8")
        img = utils.img_b64_to_arr(imageData)
 
        label_name_to_value = {"_background_": 0}
        for shape in sorted(data["shapes"], key=lambda x: x["label"]):
            label_name = shape["label"]
            if label_name in label_name_to_value:
                label_value = label_name_to_value[label_name]
            else:
                label_value = len(label_name_to_value)
                label_name_to_value[label_name] = label_value
        lbl, _ = utils.shapes_to_label(
            img.shape, data["shapes"], label_name_to_value
        )
 
        label_names = [None] * (max(label_name_to_value.values()) + 1)
        for name, value in label_name_to_value.items():
            label_names[value] = name
 
        lbl_viz = imgviz.label2rgb(
            lbl, imgviz.asgray(img), label_names=label_names, loc="rb"
        )
 
        PIL.Image.fromarray(img).save(osp.join(out_dir, "img.png"))
        utils.lblsave(osp.join(out_dir, "label.png"), lbl)
        PIL.Image.fromarray(lbl_viz).save(osp.join(out_dir, "label_viz.png"))
 
        with open(osp.join(out_dir, "label_names.txt"), "w") as f:
            for lbl_name in label_names:
                f.write(lbl_name + "\n")
 
        logger.info("Saved to: {}".format(out_dir))
        #######
        #增加了yaml生成部分
        logger.warning('info.yaml is being replaced by label_names.txt')
        info = dict(label_names=label_names)
        with open(osp.join(out_dir, 'info.yaml'), 'w') as f:
            yaml.safe_dump(info, f, default_flow_style=False)
        logger.info('Saved to: {}'.format(out_dir))
 
 
 
 
if __name__ == "__main__":
    main()

将自己的.jpg和.json文件批量转换，每一个数据对应的生成的文件夹下一共包含5个文件。

然后这个代码一直跑不通……就换了一种方法。

方案三 YOLOv8

数据集

├─images
│  ├─test
│  ├─train
│  └─val
└─labels
    ├─test
    ├─train
    └─val

训练

from ultralytics import YOLO


def main():
    # Load a model
    model = YOLO("yolov8n-seg.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="./keyboard.yaml", epochs=100, plots=True, batch=4)


if __name__ == '__main__':
    main()

测试

from ultralytics import YOLO
import numpy as np
from pathlib import Path
import cv2

model = YOLO("best.pt")

results = model(r"E:\0\keyboard\datasets\train_data\images\train")
for result in results:
    img = np.copy(result.orig_img)
    img_name = Path(result.path).stem  # 获取源图像的基本名称

    # 创建一个与原始图像相同大小的透明背景图像
    transparent_img = np.zeros_like(img, dtype=np.uint8)

    for ci, c in enumerate(result):
        # 获取检测到的类别名称
        label = c.names[c.boxes.cls.tolist().pop()]

        # 获取分割掩码
        masks = c.masks.xy  # 获取所有分割掩码

        for i, mask in enumerate(masks):
            # 创建二进制掩码图像
            b_mask = np.zeros(img.shape[:2], np.uint8)
            contour = mask.astype(np.int32).reshape(-1, 1, 2)
            cv2.drawContours(b_mask, [contour], -1, (255), cv2.FILLED)

            # 将掩码区域复制到透明背景图像中
            transparent_img[b_mask == 255] = img[b_mask == 255]

            # 保存掩码图像
            mask_img_name = f"./runs/crop/{img_name}_{label}_mask_{i+1}.png"
            cv2.imwrite(mask_img_name, transparent_img)

第二步图像裁剪+透视变换+灰度处理

测试文件修改了一下

from ultralytics import YOLO
import numpy as np
from pathlib import Path
import cv2

model = YOLO("best.pt")
# 获取用户主目录
home_dir = Path.home()
# 构建完整路径
source_path = home_dir / 'YOLO' / 'datasets' / 'images' / 'test'
# 进行预测
results = model(str(source_path))

for result in results:
    img = np.copy(result.orig_img)
    img_name = Path(result.path).stem  # 获取源图像的基本名称

    for ci, c in enumerate(result):
        # 获取检测到的类别名称
        label = c.names[c.boxes.cls.tolist().pop()]

        # 获取分割掩码
        masks = c.masks.xy  # 获取所有分割掩码

        for i, mask in enumerate(masks):
            # 创建二进制掩码图像
            b_mask = np.zeros(img.shape[:2], np.uint8)
            contour = mask.astype(np.int32).reshape(-1, 1, 2)
            cv2.drawContours(b_mask, [contour], -1, (255), cv2.FILLED)

            # 查找最大轮廓
            contours, _ = cv2.findContours(b_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
            if not contours:
                continue
            max_contour = max(contours, key=cv2.contourArea)

            # 轮廓近似为四边形
            epsilon = 0.02 * cv2.arcLength(max_contour, True)
            approx = cv2.approxPolyDP(max_contour, epsilon, True)
            if len(approx) != 4:
                continue

            # 重新排列顶点顺序
            pts = approx.reshape(4, 2)
            rect = np.zeros((4, 2), dtype="float32")
            s = pts.sum(axis=1)
            rect[0] = pts[np.argmin(s)]
            rect[2] = pts[np.argmax(s)]
            diff = np.diff(pts, axis=1)
            rect[1] = pts[np.argmin(diff)]
            rect[3] = pts[np.argmax(diff)]

            # 计算透视变换的目标矩形
            (tl, tr, br, bl) = rect
            widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
            widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
            maxWidth = max(int(widthA), int(widthB))
            heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
            heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
            maxHeight = max(int(heightA), int(heightB))
            dst = np.array([
                [0, 0],
                [maxWidth - 1, 0],
                [maxWidth - 1, maxHeight - 1],
                [0, maxHeight - 1]], dtype="float32")

            # 计算透视变换矩阵
            M = cv2.getPerspectiveTransform(rect, dst)

            # 应用透视变换
            warped_img = cv2.warpPerspective(img, M, (maxWidth, maxHeight))

#             # 转换为灰度图像
#             gray_img = cv2.cvtColor(warped_img, cv2.COLOR_BGR2GRAY)

#             # 二值化处理
#             _, binary_img = cv2.threshold(gray_img, 100, 255, cv2.THRESH_BINARY)

#             # 保存裁剪后的二值化图像
            mask_img_name = f"./runs/crop/{img_name}_{label}_mask_{i + 1}.png"
            cv2.imwrite(mask_img_name, warped_img)

由于训练集是自己临时做的，不是特别大，然后光线问题，做出来的效果跟需求有一定区别。