第一步 键盘检测
方案一 canny边缘检测
canny边缘检测检测结果不稳定,容易因为复杂背景或光线变换检测出其他目标。
如图是用canny边缘检测方法标出的检测出的边缘的四个红点。
方案二 Mask-RCNN
参考Mask Rcnn目标分割-训练自己数据集-详细步骤_maskrcnn训练自己的数据集-优快云博客
1. 下载代码和配置环境
远程+源代码方案
点击上传zip文件。
Linux 系统中不同层级的目录结构
根目录 /
下的文件目录。根目录 /
是 Linux 文件系统的最顶层目录,所有其他目录都是它的子目录。
root@autodl-container-9403468337-223581eb:~# cd /
root@autodl-container-9403468337-223581eb:/# ls
bin boot dev etc home init lib lib32 lib64 libx32 media mnt NGC-DL-CONTAINER-LICENSE opt proc root run sbin srv sys tmp usr var
用户主目录 ~
下的文件目录。用户主目录是每个用户专属的工作目录,用户可以在其中自由创建、修改和删除文件及文件夹。在这个目录中看到的内容通常与用户的特定操作和环境有关:
root@autodl-container-9403468337-223581eb:/# cd ~
root@autodl-container-9403468337-223581eb:~# ls
autodl-pub autodl-tmp Mask_RCNN-master.zip miniconda3 tf-logs
解压代码包
root@autodl-container-9403468337-223581eb:~# unzip Mask_RCNN-master.zip
之后根据确保requirements.txt安装必要的库。
本地+网传代码方案
(keyboard) C:\Users\吴伊晴>git clone https://github.com/matterport/Mask_RCNN.git
Cloning into 'Mask_RCNN'...
remote: Enumerating objects: 956, done.
remote: Total 956 (delta 0), reused 0 (delta 0), pack-reused 956 (from 1)
Receiving objects: 100% (956/956), 137.67 MiB | 16.28 MiB/s, done.
Resolving deltas: 100% (558/558), done.
(keyboard) C:\Users\吴伊晴>cd Mask_RCNN
(keyboard) C:\Users\吴伊晴\Mask_RCNN>python setup.py install
setup.py:9: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
WARNING:root:Fail load requirements file, so using default ones.
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\dist.py:452: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!
********************************************************************************
Usage of dash-separated 'description-file' will not be supported in future
versions. Please use the underscore name 'description_file' instead.
This deprecation is overdue, please update your project and remove deprecated
calls to avoid build errors in the future.
See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
********************************************************************************
!!
opt = self.warn_dash_deprecation(opt, section)
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\dist.py:452: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!
********************************************************************************
Usage of dash-separated 'license-file' will not be supported in future
versions. Please use the underscore name 'license_file' instead.
This deprecation is overdue, please update your project and remove deprecated
calls to avoid build errors in the future.
See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
********************************************************************************
!!
opt = self.warn_dash_deprecation(opt, section)
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\dist.py:452: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!
********************************************************************************
Usage of dash-separated 'requirements-file' will not be supported in future
versions. Please use the underscore name 'requirements_file' instead.
This deprecation is overdue, please update your project and remove deprecated
calls to avoid build errors in the future.
See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
********************************************************************************
!!
opt = self.warn_dash_deprecation(opt, section)
INFO:root:running install
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\_distutils\cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` directly.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************
!!
self.initialize_options()
D:\Env\ANACONDA\envs\keyboard\lib\site-packages\setuptools\_distutils\cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` and ``easy_install``.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://github.com/pypa/setuptools/issues/917 for details.
********************************************************************************
!!
self.initialize_options()
INFO:root:running bdist_egg
INFO:root:running egg_info
INFO:root:creating mask_rcnn.egg-info
INFO:root:writing mask_rcnn.egg-info\PKG-INFO
INFO:root:writing dependency_links to mask_rcnn.egg-info\dependency_links.txt
INFO:root:writing top-level names to mask_rcnn.egg-info\top_level.txt
INFO:root:writing manifest file 'mask_rcnn.egg-info\SOURCES.txt'
INFO:root:reading manifest file 'mask_rcnn.egg-info\SOURCES.txt'
INFO:root:reading manifest template 'MANIFEST.in'
INFO:root:adding license file 'LICENSE'
INFO:root:writing manifest file 'mask_rcnn.egg-info\SOURCES.txt'
INFO:root:installing library code to build\bdist.win-amd64\egg
INFO:root:running install_lib
INFO:root:running build_py
INFO:root:creating build\lib\mrcnn
INFO:root:copying mrcnn\config.py -> build\lib\mrcnn
INFO:root:copying mrcnn\model.py -> build\lib\mrcnn
INFO:root:copying mrcnn\parallel_model.py -> build\lib\mrcnn
INFO:root:copying mrcnn\utils.py -> build\lib\mrcnn
INFO:root:copying mrcnn\visualize.py -> build\lib\mrcnn
INFO:root:copying mrcnn\__init__.py -> build\lib\mrcnn
INFO:root:creating build\bdist.win-amd64\egg
INFO:root:creating build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\config.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\model.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\parallel_model.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\utils.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\visualize.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:copying build\lib\mrcnn\__init__.py -> build\bdist.win-amd64\egg\mrcnn
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\config.py to config.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\model.py to model.cpython-38.pyc
build\bdist.win-amd64\egg\mrcnn\model.py:2359: SyntaxWarning: "is" with a literal. Did you mean "=="?
if os.name is 'nt':
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\parallel_model.py to parallel_model.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\utils.py to utils.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\visualize.py to visualize.cpython-38.pyc
INFO:root:byte-compiling build\bdist.win-amd64\egg\mrcnn\__init__.py to __init__.cpython-38.pyc
INFO:root:creating build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
INFO:root:copying mask_rcnn.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
WARNING:root:zip_safe flag not set; analyzing archive contents...
INFO:root:creating dist
INFO:root:creating 'dist\mask_rcnn-2.1-py3.8.egg' and adding 'build\bdist.win-amd64\egg' to it
INFO:root:removing 'build\bdist.win-amd64\egg' (and everything under it)
INFO:root:Processing mask_rcnn-2.1-py3.8.egg
INFO:root:Copying mask_rcnn-2.1-py3.8.egg to d:\env\anaconda\envs\keyboard\lib\site-packages
INFO:root:Adding mask-rcnn 2.1 to easy-install.pth file
INFO:root:
Installed d:\env\anaconda\envs\keyboard\lib\site-packages\mask_rcnn-2.1-py3.8.egg
INFO:root:Processing dependencies for mask-rcnn==2.1
INFO:root:Finished processing dependencies for mask-rcnn==2.1
2. 准备数据集
为了训练 Mask R-CNN,我准备了一个包含键盘的标注数据集。我先收集键盘图片,再使用 LabelMe工具来手动标注键盘。(由于云端服务器好像打不开LabelMe,我在本地服务器标注的数据集。)
标注后得到的数据集是这样的。
由于数据集的格式需要与 Mask R-CNN 所要求的格式兼容,所以要将标签转换为coco数据集格式。使用转换代码
import argparse
import base64
import json
import os
import os.path as osp
import imgviz
import PIL.Image
from labelme.logger import logger
from labelme import utils
import glob
# 最前面加入导包
import yaml
def main():
logger.warning(
"This script is aimed to demonstrate how to convert the "
"JSON file to a single image dataset."
)
logger.warning(
"It won't handle multiple JSON files to generate a "
"real-use dataset."
)
parser = argparse.ArgumentParser()
###############################################增加的语句##############################
# parser.add_argument("json_file")
parser.add_argument("--json_dir",default="D:/2021file/Biye/Mask_RCNN-master/samples/Mydata")
###############################################end###################################
parser.add_argument("-o", "--out", default=None)
args = parser.parse_args()
###############################################增加的语句##############################
assert args.json_dir is not None and len(args.json_dir) > 0
# json_file = args.json_file
json_dir = args.json_dir
if osp.isfile(json_dir):
json_list = [json_dir] if json_dir.endswith('.json') else []
else:
json_list = glob.glob(os.path.join(json_dir, '*.json'))
###############################################end###################################
for json_file in json_list:
json_name = osp.basename(json_file).split('.')[0]
out_dir = args.out if (args.out is not None) else osp.join(osp.dirname(json_file), json_name)
###############################################end###################################
if not osp.exists(out_dir):
os.makedirs(out_dir)
data = json.load(open(json_file))
imageData = data.get("imageData")
if not imageData:
imagePath = os.path.join(os.path.dirname(json_file), data["imagePath"])
with open(imagePath, "rb") as f:
imageData = f.read()
imageData = base64.b64encode(imageData).decode("utf-8")
img = utils.img_b64_to_arr(imageData)
label_name_to_value = {"_background_": 0}
for shape in sorted(data["shapes"], key=lambda x: x["label"]):
label_name = shape["label"]
if label_name in label_name_to_value:
label_value = label_name_to_value[label_name]
else:
label_value = len(label_name_to_value)
label_name_to_value[label_name] = label_value
lbl, _ = utils.shapes_to_label(
img.shape, data["shapes"], label_name_to_value
)
label_names = [None] * (max(label_name_to_value.values()) + 1)
for name, value in label_name_to_value.items():
label_names[value] = name
lbl_viz = imgviz.label2rgb(
lbl, imgviz.asgray(img), label_names=label_names, loc="rb"
)
PIL.Image.fromarray(img).save(osp.join(out_dir, "img.png"))
utils.lblsave(osp.join(out_dir, "label.png"), lbl)
PIL.Image.fromarray(lbl_viz).save(osp.join(out_dir, "label_viz.png"))
with open(osp.join(out_dir, "label_names.txt"), "w") as f:
for lbl_name in label_names:
f.write(lbl_name + "\n")
logger.info("Saved to: {}".format(out_dir))
#######
#增加了yaml生成部分
logger.warning('info.yaml is being replaced by label_names.txt')
info = dict(label_names=label_names)
with open(osp.join(out_dir, 'info.yaml'), 'w') as f:
yaml.safe_dump(info, f, default_flow_style=False)
logger.info('Saved to: {}'.format(out_dir))
if __name__ == "__main__":
main()
将自己的.jpg和.json文件批量转换,每一个数据对应的生成的文件夹下一共包含5个文件。
然后这个代码一直跑不通……就换了一种方法。
方案三 YOLOv8
数据集
├─images
│ ├─test
│ ├─train
│ └─val
└─labels
├─test
├─train
└─val
训练
from ultralytics import YOLO
def main():
# Load a model
model = YOLO("yolov8n-seg.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="./keyboard.yaml", epochs=100, plots=True, batch=4)
if __name__ == '__main__':
main()
测试
from ultralytics import YOLO
import numpy as np
from pathlib import Path
import cv2
model = YOLO("best.pt")
results = model(r"E:\0\keyboard\datasets\train_data\images\train")
for result in results:
img = np.copy(result.orig_img)
img_name = Path(result.path).stem # 获取源图像的基本名称
# 创建一个与原始图像相同大小的透明背景图像
transparent_img = np.zeros_like(img, dtype=np.uint8)
for ci, c in enumerate(result):
# 获取检测到的类别名称
label = c.names[c.boxes.cls.tolist().pop()]
# 获取分割掩码
masks = c.masks.xy # 获取所有分割掩码
for i, mask in enumerate(masks):
# 创建二进制掩码图像
b_mask = np.zeros(img.shape[:2], np.uint8)
contour = mask.astype(np.int32).reshape(-1, 1, 2)
cv2.drawContours(b_mask, [contour], -1, (255), cv2.FILLED)
# 将掩码区域复制到透明背景图像中
transparent_img[b_mask == 255] = img[b_mask == 255]
# 保存掩码图像
mask_img_name = f"./runs/crop/{img_name}_{label}_mask_{i+1}.png"
cv2.imwrite(mask_img_name, transparent_img)
第二步 图像裁剪+透视变换+灰度处理
测试文件修改了一下
from ultralytics import YOLO
import numpy as np
from pathlib import Path
import cv2
model = YOLO("best.pt")
# 获取用户主目录
home_dir = Path.home()
# 构建完整路径
source_path = home_dir / 'YOLO' / 'datasets' / 'images' / 'test'
# 进行预测
results = model(str(source_path))
for result in results:
img = np.copy(result.orig_img)
img_name = Path(result.path).stem # 获取源图像的基本名称
for ci, c in enumerate(result):
# 获取检测到的类别名称
label = c.names[c.boxes.cls.tolist().pop()]
# 获取分割掩码
masks = c.masks.xy # 获取所有分割掩码
for i, mask in enumerate(masks):
# 创建二进制掩码图像
b_mask = np.zeros(img.shape[:2], np.uint8)
contour = mask.astype(np.int32).reshape(-1, 1, 2)
cv2.drawContours(b_mask, [contour], -1, (255), cv2.FILLED)
# 查找最大轮廓
contours, _ = cv2.findContours(b_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if not contours:
continue
max_contour = max(contours, key=cv2.contourArea)
# 轮廓近似为四边形
epsilon = 0.02 * cv2.arcLength(max_contour, True)
approx = cv2.approxPolyDP(max_contour, epsilon, True)
if len(approx) != 4:
continue
# 重新排列顶点顺序
pts = approx.reshape(4, 2)
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
# 计算透视变换的目标矩形
(tl, tr, br, bl) = rect
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
maxWidth = max(int(widthA), int(widthB))
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
maxHeight = max(int(heightA), int(heightB))
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype="float32")
# 计算透视变换矩阵
M = cv2.getPerspectiveTransform(rect, dst)
# 应用透视变换
warped_img = cv2.warpPerspective(img, M, (maxWidth, maxHeight))
# # 转换为灰度图像
# gray_img = cv2.cvtColor(warped_img, cv2.COLOR_BGR2GRAY)
# # 二值化处理
# _, binary_img = cv2.threshold(gray_img, 100, 255, cv2.THRESH_BINARY)
# # 保存裁剪后的二值化图像
mask_img_name = f"./runs/crop/{img_name}_{label}_mask_{i + 1}.png"
cv2.imwrite(mask_img_name, warped_img)
由于训练集是自己临时做的,不是特别大,然后光线问题,做出来的效果跟需求有一定区别。
改进:
1.扩充数据集
2.平衡照片光照