使用深度学习解决拍照文档复杂背景二值化问题

原创已于 2024-04-11 09:36:52 修改 · 4.1k 阅读

44 ·

CC 4.0 BY-SA版权

文章标签：

#省墨模式 #打印模式 #扫描 #手机扫描 #扫描王

于 2021-12-14 10:54:56 首次发布

计算机视觉专栏收录该内容

131 篇文章

订阅专栏

本文探讨了传统图像处理技术如大津法和自适应二值化在文档处理中的局限性，着重介绍了积分二值化和基于U-Net的深度学习方法在复杂光照和干扰情况下的优势，以及实际应用案例。通过对比不同方法，展示了深度学习在文档二值化中的优越性能和未来潜力。

前言

1.在手持拍照设备对文档进行拍照时，很容易出现光线不均、阴影、过暗等，或者有些旧的文档，古籍文档都有虫洞、透背、字迹不清现象，为了方便阅读、打印文档，或者OCR识别，这些干扰都对处理结果有很多不良的影响。
2.在文档处理过程，往往分这几步，图像预处理、文档图像二值化、版面分析、文本检测与识别等环节，在文档才二值这块，有很多传统的算法可以使用，比如大津法，自适应二值化等，但在使用的过程，这些传统的算法只对针对某些特定的干扰做深度调参，并不能达到对所有文档都有高鲁棒性。

一.传统方法

1.传统数字图像处理里面，对图像二值化有好多用可用办法，我自己试过几种方法，从最常用的大津法、自适应二值化，到比较偏门的积分二值化。下面来对比下这几种方法的效果。

效果图像第一是灰度图像，第二张是自适应二值化，第三张是大津法，第四张是积分二值化。

第一种场景，手写文档，光线不均，有少些阴影：
原图：
在这里插入图片描述
二值化效果图像：

第二种场景，带大面积阴影的印刷文档，而且阴影比较明显：
原图：

二值图像效果图：
在这里插入图片描述
第三种场景，有很重透背的文档：
原图：

二值图像效果图：
在这里插入图片描述
第四种场景，纸张带有底色的古籍手抄文档：
原图：

二值图像效果图：
在这里插入图片描述

2.从以上效果来看，当使用场景有干扰或者光线有变化的情况下，传统的图像图像处理并不能完美的解决文档图像二值化的问题，表现稍微好一些积分二值化（效果图像第四格）也不能胜任大部分环境。但有些使用场景相对稳定的情况下，可以选择积分二值化这个方法。下面是积分二值化的代码，是基于OpenCV C++写的。


/// <summary>
/// 积分二值化
/// </summary>
/// <param name="inputMat">输入图像</param>
/// <param name="thre">阈值（1.0）</param>
/// <param name="outputMat">输出图像</param>
void thresholdIntegral(cv::Mat& inputMat, double thre, cv::Mat& outputMat)
{
    // accept only char type matrices
    CV_Assert(!inputMat.empty());
    CV_Assert(inputMat.depth() == CV_8U);
    CV_Assert(inputMat.channels() == 1);
   
    outputMat = cv::Mat(inputMat.size(), CV_8UC1, 1);

    // rows -> height -> y
    int nRows = inputMat.rows;
    // cols -> width -> x
    int nCols = inputMat.cols;

    // create the integral image
    cv::Mat sumMat;
    cv::integral(inputMat, sumMat);

    CV_Assert(sumMat.depth() == CV_32S);
    CV_Assert(sizeof(int) == 4);

    int S = MAX(nRows, nCols) / 8;
    double T = 0.15;

    // perform thresholding
    int s2 = S / 2;
    int x1, y1, x2, y2, count, sum;

    // CV_Assert(sizeof(int) == 4);
    int* p_y1, * p_y2;
    uchar* p_inputMat, * p_outputMat;

    for (int i = 0; i < nRows; ++i)
    {
        y1 = i - s2;
        y2 = i + s2;

        if (y1 < 0)
        {
            y1 = 0;
        }
        if (y2 >= nRows)
        {
            y2 = nRows - 1;
        }

        p_y1 = sumMat.ptr<int>(y1);
        p_y2 = sumMat.ptr<int>(y2);
        p_inputMat = inputMat.ptr<uchar>(i);
        p_outputMat = outputMat.ptr<uchar>(i);

        for (int j = 0; j < nCols; ++j)
        {
            // set the SxS region
            x1 = j - s2;
            x2 = j + s2;

            if (x1 < 0)
            {
                x1 = 0;
            }
            if (x2 >= nCols)
            {
                x2 = nCols - 1;
            }

            count = (x2 - x1) * (y2 - y1);

            // I(x,y)=s(x2,y2)-s(x1,y2)-s(x2,y1)+s(x1,x1)
            sum = p_y2[x2] - p_y1[x2] - p_y2[x1] + p_y1[x1];

            if ((int)(p_inputMat[j] * count) < (int)(sum * (1.0 - T) * thre))
                p_outputMat[j] = 0;
            else
                p_outputMat[j] = 255;
        }
    }
}

3.当使用环境不确定或者使用环境比较复杂的时候，传统方法再什么调参也不能完美解决，这个时候只能考虑使用深度学习了。

二.基于U-net的图像二值化

1.Unet 网络
U-Net一开始就是针对生物医学图片的分割用的，一直到现在许多对医学图像的分割网络中，很大一部分会采取U-Net作为网络的主干。
算法部分我这里参考的U-Net关于视网膜血管分割这个项目，github地址：https://github.com/orobix/retina-unet 。我们可以看看它对视网膜血管分割。
在这里插入图片描述
2.深度学习框架用的Pytorch，参考了这个项目：https://github.com/milesial/Pytorch-UNet添加链接描述。
3.但U-Net只能训练尺寸为512的图像，但对于拍摄的文档，如果把尺寸都压到512来做标签和训练，肯定会丢失好多细节上的东西，我这里把网络按ENet的结构做了微调。

三.模型推理

1.我的测试环境是win10，vs2019,OpenCV 4.5，模型推理这里为了方便，就直接用OpenCV的dnn。
2.代码：

DirtyDocUnet类：

#pragma once
#include <iostream>
#include <string>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn/dnn.hpp>

class DirtyDocUnet
{
public:
	DirtyDocUnet(std::string _model_path);
	
	void dnnInference(cv::Mat &cv_src, cv::Mat &cv_dst);

	void docBin(const cv::Mat& cv_src, cv::Mat& cv_dst);
private:
	std::string model_path;
	cv::dnn::Net doc_net;
	int target_w = 1560;
	int target_h = 1560;
};

#include "DirtyDocUnet.h"

DirtyDocUnet::DirtyDocUnet(std::string _model_path)
{
	model_path = _model_path;
	doc_net = cv::dnn::readNet(model_path);
}

void DirtyDocUnet::dnnInference(cv::Mat &cv_src, cv::Mat &cv_dst)
{
	cv::Size reso(this->target_w,this->target_h);

	cv::Mat cv_gray;
	cv::cvtColor(cv_src, cv_gray, cv::COLOR_BGR2GRAY);
	cv::Mat bold = cv::dnn::blobFromImage(cv_gray, 1.0 / 255, reso, cv::Scalar(0, 0, 0), false, false);
	doc_net.setInput(bold);

	cv::Mat cv_out = doc_net.forward();

	cv::Mat cv_seg = cv::Mat::zeros(cv_out.size[2], cv_out.size[3], CV_8UC1);

	for (int i = 0; i < cv_out.size[2] * cv_out.size[3]; i++)
	{
		cv_seg.data[i] = cv_out.ptr<float>(0, 0)[i] * 255;
	} 
	cv::resize(cv_seg, cv_dst, cv_src.size());
}

/// <summary>
/// 二值图像的边缘光滑处理
/// </summary>
/// <param name="src">输入图像</param>
/// <param name="dst">输出图像</param>
/// <param name="uthreshold">宽度阈值</param>
/// <param name="vthreshold">高度阈值</param>
/// <param name="type">突出部的颜色，0表示黑色，1代表白色</param>
void deleteZigzag(cv::Mat& src, cv::Mat& dst, int uthreshold, int vthreshold, int type)
{
    //int threshold;
    src.copyTo(dst);
    int height = dst.rows;
    int width = dst.cols;
    int k;  //用于循环计数传递到外部
    for (int i = 0; i < height - 1; i++)
    {
        uchar* p = dst.ptr<uchar>(i);
        for (int j = 0; j < width - 1; j++)
        {
            if (type == 0)
            {
                //行消除
                if (p[j] == 255 && p[j + 1] == 0)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 255)
                            {
                                break;
                            }
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + 1; h < k; h++)
                            {
                                p[h] = 255;
                            }
                        }
                    }
                }
                //列消除
                if (p[j] == 255 && p[j + width] == 0)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 255) break;
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 255;
                        }
                    }
                }
            }
            else  //type = 1
            {
                //行消除
                if (p[j] == 0 && p[j + 1] == 255)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + 1; h < k; h++)
                                p[h] = 0;
                        }
                    }
                }
                //列消除
                if (p[j] == 0 && p[j + width] == 255)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 0;
                        }
                    }
                }
            }
        }
    }
}

void  DirtyDocUnet::docBin(const cv::Mat& cv_src, cv::Mat& cv_dst)
{
	if (cv_src.empty())
	{
		return;
	}
	std::vector<cv::Mat> cv_pieces;
	cv_pieces.push_back(cv_src(cv::Rect(0, 0, cv_src.cols, cv_src.rows / 2)));
	cv_pieces.push_back(cv_src(cv::Rect(0, cv_src.rows / 2, cv_src.cols, cv_src.rows / 2)));

	cv::Mat cv_pars;
	for (auto v : cv_pieces)
	{
		cv::Mat cv_temp;

		dnnInference(v, cv_temp);

		cv_pars.push_back(cv_temp);
	}
    cv::Mat cv_resize;
    cv::resize(~cv_pars, cv_resize, cv::Size(4096, 4096), cv::INTER_CUBIC);
    cv::Mat cv_zig;
    deleteZigzag(cv_resize, cv_zig, 5, 5, 0);
    cv::Mat cv_bin;

    cv::resize(~cv_zig, cv_dst, cv::Size(cv_src.cols, cv_src.rows), cv::INTER_LINEAR);
}

#include "DirtyDocUnet.h"

DirtyDocUnet::DirtyDocUnet(std::string _model_path)
{
	model_path = _model_path;
	doc_net = cv::dnn::readNet(model_path);
}

void DirtyDocUnet::dnnInference(cv::Mat &cv_src, cv::Mat &cv_dst)
{
	cv::Size reso(this->target_w,this->target_h);

	cv::Mat cv_gray;
	cv::cvtColor(cv_src, cv_gray, cv::COLOR_BGR2GRAY);
	cv::Mat bold = cv::dnn::blobFromImage(cv_gray, 1.0 / 255, reso, cv::Scalar(0, 0, 0), false, false);
	doc_net.setInput(bold);

	cv::Mat cv_out = doc_net.forward();

	cv::Mat cv_seg = cv::Mat::zeros(cv_out.size[2], cv_out.size[3], CV_8UC1);

	for (int i = 0; i < cv_out.size[2] * cv_out.size[3]; i++)
	{
		cv_seg.data[i] = cv_out.ptr<float>(0, 0)[i] * 255;
	} 
	cv::resize(cv_seg, cv_dst, cv_src.size());
}

/// <summary>
/// 二值图像的边缘光滑处理
/// </summary>
/// <param name="src">输入图像</param>
/// <param name="dst">输出图像</param>
/// <param name="uthreshold">宽度阈值</param>
/// <param name="vthreshold">高度阈值</param>
/// <param name="type">突出部的颜色，0表示黑色，1代表白色</param>
void deleteZigzag(cv::Mat& src, cv::Mat& dst, int uthreshold, int vthreshold, int type)
{
    //int threshold;
    src.copyTo(dst);
    int height = dst.rows;
    int width = dst.cols;
    int k;  //用于循环计数传递到外部
    for (int i = 0; i < height - 1; i++)
    {
        uchar* p = dst.ptr<uchar>(i);
        for (int j = 0; j < width - 1; j++)
        {
            if (type == 0)
            {
                //行消除
                if (p[j] == 255 && p[j + 1] == 0)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 255)
                            {
                                break;
                            }
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + 1; h < k; h++)
                            {
                                p[h] = 255;
                            }
                        }
                    }
                }
                //列消除
                if (p[j] == 255 && p[j + width] == 0)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 255) break;
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 255;
                        }
                    }
                }
            }
            else  //type = 1
            {
                //行消除
                if (p[j] == 0 && p[j + 1] == 255)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + 1; h < k; h++)
                                p[h] = 0;
                        }
                    }
                }
                //列消除
                if (p[j] == 0 && p[j + width] == 255)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 0;
                        }
                    }
                }
            }
        }
    }
}

void  DirtyDocUnet::docBin(const cv::Mat& cv_src, cv::Mat& cv_dst)
{
	if (cv_src.empty())
	{
		return;
	}
	std::vector<cv::Mat> cv_pieces;
	cv_pieces.push_back(cv_src(cv::Rect(0, 0, cv_src.cols, cv_src.rows / 2)));
	cv_pieces.push_back(cv_src(cv::Rect(0, cv_src.rows / 2, cv_src.cols, cv_src.rows / 2)));

	cv::Mat cv_pars;
	for (auto v : cv_pieces)
	{
		cv::Mat cv_temp;

		dnnInference(v, cv_temp);

		cv_pars.push_back(cv_temp);
	}
    cv::Mat cv_resize;
    cv::resize(~cv_pars, cv_resize, cv::Size(4096, 4096), cv::INTER_CUBIC);
    cv::Mat cv_zig;
    deleteZigzag(cv_resize, cv_zig, 5, 5, 0);
    cv::Mat cv_bin;

    cv::resize(~cv_zig, cv_dst, cv::Size(cv_src.cols, cv_src.rows), cv::INTER_LINEAR);
}

调用类
main.cpp

#include <iostream>
#include "DirtyDocUnet.h"

void thresholdIntegral(cv::Mat& inputMat, double thre, cv::Mat& outputMat);
void mergeImages(const cv::Mat& cv_src1, const cv::Mat& cv_src2, cv::Mat& cv_dst);

void imshow(std::string name, const cv::Mat& cv_src)
{
    cv::namedWindow(name, 0);
    int max_rows = 800;
    int max_cols = 800;
    if (cv_src.rows >= cv_src.cols && cv_src.rows > max_rows)
    {
        cv::resizeWindow(name, cv::Size(cv_src.cols * max_rows / cv_src.rows, max_rows));
    }
    else if (cv_src.cols >= cv_src.rows && cv_src.cols > max_cols)
    {
        cv::resizeWindow(name, cv::Size(max_cols, cv_src.rows * max_cols / cv_src.cols));
    }
    cv::imshow(name, cv_src);
}

int main(void)
{
    std::string path = "images";
    std::vector<std::string> filenames;
    cv::glob(path, filenames, false);
    std::string model_path = "models/unetv2.onnx";

    DirtyDocUnet doc_bin(model_path);
    int i = 0;

    for (auto v : filenames)
    {
        cv::Mat cv_src = cv::imread(v);
        cv::Mat cv_bin, cv_otsu, cv_gray, cv_integral;
        cv::cvtColor(cv_src, cv_gray, cv::COLOR_BGR2GRAY);

        cv::threshold(cv_gray, cv_bin, 127, 255, cv::THRESH_BINARY);
        cv::threshold(cv_gray, cv_otsu, 0, 255, cv::THRESH_OTSU);
        thresholdIntegral(cv_gray, 1.0, cv_integral);

        cv::Mat cv_unet;
        doc_bin.docBin(cv_src, cv_unet);

        cv_bin.push_back(cv_otsu);
        cv_integral.push_back(cv_unet);

        cv::Mat cv_all;
        mergeImages(cv_bin, cv_integral,cv_all);
        cv::imwrite(v, cv_all);
    }
}

/// <summary>
/// 积分二值化
/// </summary>
/// <param name="inputMat">输入图像</param>
/// <param name="thre">阈值（1.0）</param>
/// <param name="outputMat">输出图像</param>
void thresholdIntegral(cv::Mat& inputMat, double thre, cv::Mat& outputMat)
{
    // accept only char type matrices
    CV_Assert(!inputMat.empty());
    CV_Assert(inputMat.depth() == CV_8U);
    CV_Assert(inputMat.channels() == 1);
   
    outputMat = cv::Mat(inputMat.size(), CV_8UC1, 1);

    // rows -> height -> y
    int nRows = inputMat.rows;
    // cols -> width -> x
    int nCols = inputMat.cols;

    // create the integral image
    cv::Mat sumMat;
    cv::integral(inputMat, sumMat);

    CV_Assert(sumMat.depth() == CV_32S);
    CV_Assert(sizeof(int) == 4);

    int S = MAX(nRows, nCols) / 8;
    double T = 0.15;

    // perform thresholding
    int s2 = S / 2;
    int x1, y1, x2, y2, count, sum;

    // CV_Assert(sizeof(int) == 4);
    int* p_y1, * p_y2;
    uchar* p_inputMat, * p_outputMat;

    for (int i = 0; i < nRows; ++i)
    {
        y1 = i - s2;
        y2 = i + s2;

        if (y1 < 0)
        {
            y1 = 0;
        }
        if (y2 >= nRows)
        {
            y2 = nRows - 1;
        }

        p_y1 = sumMat.ptr<int>(y1);
        p_y2 = sumMat.ptr<int>(y2);
        p_inputMat = inputMat.ptr<uchar>(i);
        p_outputMat = outputMat.ptr<uchar>(i);

        for (int j = 0; j < nCols; ++j)
        {
            // set the SxS region
            x1 = j - s2;
            x2 = j + s2;

            if (x1 < 0)
            {
                x1 = 0;
            }
            if (x2 >= nCols)
            {
                x2 = nCols - 1;
            }

            count = (x2 - x1) * (y2 - y1);

            // I(x,y)=s(x2,y2)-s(x1,y2)-s(x2,y1)+s(x1,x1)
            sum = p_y2[x2] - p_y1[x2] - p_y2[x1] + p_y1[x1];

            if ((int)(p_inputMat[j] * count) < (int)(sum * (1.0 - T) * thre))
                p_outputMat[j] = 0;
            else
                p_outputMat[j] = 255;
        }
    }
}

void mergeImages(const cv::Mat& cv_src1, const cv::Mat& cv_src2, cv::Mat& cv_dst)
{
    CV_Assert(!(cv_src1.rows != cv_src2.rows || cv_src1.cols != cv_src2.cols));
    CV_Assert(!(cv_src1.empty() || cv_src2.empty()));

    cv_dst.create(cv_src1.rows, cv_src1.cols * 2, cv_src1.type());
    cv::Mat r1 = cv_dst(cv::Rect(0, 0, cv_src1.cols, cv_src1.rows));
    cv_src1.copyTo(r1);

    cv::Mat r2 = cv_dst(cv::Rect(cv_src1.cols, 0, cv_src1.cols, cv_src1.rows));
    cv_src2.copyTo(r2);
}

3.对比下处理效果
原图：
在这里插入图片描述
二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：

原图：
在这里插入图片描述

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：
在这里插入图片描述

原图：
在这里插入图片描述

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：
在这里插入图片描述
原图：

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：

原图：

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：
在这里插入图片描述
4.从整体的效果上看，使用深度学习方法，最终的效果不管在什么样的环境在，都能得到一个不错的。其实这个效果还有可提升的空间，我当前用的训练集大概是2000张左右的样本，如果还能增加更多环境下的样本，那模型泛化会更好。
5.这个效果在一些手机扫描类APP里面也有类似的功能，一般叫省墨模式，或者黑白扫描，我们在安卓和iOS上都移植了这个算法，下面是我们iOS APP里面的效果,对移动端扫描APP感兴趣的可以去试试《扫描家》这个APP。
请添加图片描述