一、环境准备
- Python语言包
- OpenCV-python开发包
- OpenCV DNN模块
- OpenCV ML模块
- pycharm2019
项目地址:https://github.com/zxinyang38/opencv-
二、结果预览
从给定的印刷品图像进行数字识别。
三、实验步骤
1、EAST TEXT对象检测模型(使用EAST网络模型实现文字区域检测)
- EAST网络架构
- 加载获取网络各层信息
east_text权重文件放在本人github地址:https://github.com/zxinyang38/opencv-
import cv2 as cv
import numpy as np
net = cv.dnn.readNet('C:/Program Files (x86)/pycharm/pycharm/PycharmProjects/ML/ocr_demo/frozen_east_text_detection.pb')
names = net.getLayerNames()
for name in names:
print(name)
以上代码可以输出每一层的名字:其中feature_fusion/Conv_7/Sigmoid对应的是EAST网络中score map部分
feature_fusion/concat_3对应的是EAST网络架构中最右边的RBOX geometry部分
- 使用网络
def detect(self,image):
(H,W) = image.shape[:2]
rH = H / float(320)
rW = W / float(320)
blob = cv.dnn.blobFromImage(image,1.0,(320,320),(123.68, 116.78, 103.94),swapRB=True,crop=False)
self.net.setInput(blob)
(scores, geometry) = self.net.forward(self.layerNames)
print(scores)
详见:text_area_detect.py函数
2、非最大抑制(NMS)
检测出来的图像可能时下图这种:
故而引入NMSBoxes API(非最大信号抑制去掉差的区域):
import cv2 as cv
import numpy as np
class TextAreaDetector:
def __init__(self,model_path):
self.net = cv.dnn.readNet(model_path)
names = self.net.getLayerNames()
for name in names:
print(name)
self.threshold = 0.5
self.layerNames = ["feature_fusion/Conv_7/Sigmoid","feature_fusion/concat_3"]
def detect(self,image):
(H,W) = image.shape[:2]
rH = H / float(320)
rW = W / float(320)
blob = cv.dnn.blobFromImage(image,1.0,(320,320),(123.68, 116.78, 103.94),swapRB=True,crop=False)
self.net.setInput(blob)
(scores, geometry) = self.net.forward(self.layerNames)
print(scores)
(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []
# start to decode the output
for y in range(0, numRows):
scoresData = scores[0, 0, y]
xData0 = geometry[0, 0, y]
xData1 = geometry[0, 1, y]
xData2 = geometry[0, 2, y]
xData3 = geometry[0, 3, y]
anglesData = geometry[0, 4, y]
# loop over the number of columns
for x in range(0, numCols):
# if our score does not have sufficient probability, ignore it
if scoresData[x] < self.threshold:
continue
# compute the offset factor as our resulting feature maps will
# be 4x smaller than the input image
(offsetX, offsetY) = (x * 4.0, y * 4.0)
# extract the rotation angle for the prediction and then
# compute the sin and cosine