最近刚好结束了霍夫三部曲以及离散傅里叶变换的总结,刚好了解到它们两个的结合可以实现一个很有意思的功能
旋转文本图像的校正,于是参考了几篇博客,记录下来。
主要参考博客:
标准霍夫直线检测 以及图像的傅里叶变换
重点参考博客:http://johnhany.net/2013/11/dft-based-text-rotation-correction/
关于傅里叶变换的原理请看我的上一篇博客,也是为这篇文章做了个铺垫
一幅文本图像
对图像进行旋转矫正,关键是要获取旋转角度是多少!获取了旋转角度就可以用仿射变换对图像进行矫正,图像旋转的代码下面会贴出。
旋转角度怎么获取?可以对图像作傅里叶变换获取这个角度,有了上篇文章的理论基础,相信不难理解
文本图像的明显特征就是存在分行间隔,那么行与文字之间这个灰度值变化就不如真正的文字及文字间的变化剧烈,那么相应的这些地方的频谱值也低,即频谱的低谱部分,因为傅里叶变换就是表征图像各点的变化频率的嘛~当文本图像旋转时,基频域中的频谱也会随之改变,那么我就可以根据这一特点来计算这个角度。
Samples:
/*
* Author: John Hany
* Website: http://johnhany.net
* Source code updates: https://github/johnhany/textRotCorrect
* If you have any advice, you could contact me at: johnhany@163.com
* Need OpenCV environment!
*
*/
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>
using namespace cv;
using namespace std;
#define GRAY_THRESH 150
#define HOUGH_VOTE 100
//#define DEGREE 27
int main(int argc, char **argv)
{
//Read a single-channel image
const char* filename = "imageText_02_R.jpg";
Mat srcImg = imread(filename, CV_LOAD_IMAGE_GRAYSCALE);
if(srcImg.empty())
return -1;
imshow("source", srcImg);
Point center(srcImg.cols/2, srcImg.rows/2);
#ifdef DEGREE
//Rotate source image
Mat rotMatS = getRotationMatrix2D(center, DEGREE, 1.0);
warpAffine(srcImg, srcImg, rotMatS, srcImg.size(), 1, 0, Scalar(255,255,255));
imshow("RotatedSrc", srcImg);
//imwrite("imageText_R.jpg",srcImg);
#endif
//Expand image to an optimal size, for faster processing speed
//Set widths of borders in four directions
//If borderType==BORDER_CONSTANT, fill the borders with (0,0,0)
Mat padded;
int opWidth = getOptimalDFTSize(srcImg.rows);
int opHeight = getOptimalDFTSize(srcImg.cols);
copyMakeBorder(srcImg, padded, 0, opWidth-srcImg.rows, 0, opHeight-srcImg.cols, BORDER_CONSTANT, Scalar::all(0));
Mat planes[] = {Mat_<float>(padded), Mat::zeros(padded.size(), CV_32F)};
Mat comImg;
//Merge into a double-channel image
merge(planes,2,comImg);
//Use the same image as input and output,
//so that the results can fit in Mat well
dft(comImg, comImg);
//Compute the magnitude
//planes[0]=Re(DFT(I)), planes[1]=Im(DFT(I))
//magnitude=sqrt(Re^2+Im^2)
split(comImg, planes);
magnitude(planes[0], planes[1], planes[0]);
//Switch to logarithmic scale, for better visual results
//M2=log(1+M1)
Mat magMat = planes[0];
magMat += Scalar::all(1);
log(magMat, magMat);
//Crop the spectrum
//Width and height of magMat should be even, so that they can be divided by 2
//-2 is 11111110 in binary system, operator & make sure width and height are always even
magMat = magMat(Rect(0, 0, magMat.cols & -2, magMat.rows & -2));
//Rearrange the quadrants of Fourier image,
//so that the origin is at the center of image,
//and move the high frequency to the corners
int cx = magMat.cols/2;
int cy = magMat.rows/2;
Mat q0(magMat, Rect(0, 0, cx, cy));
Mat q1(magMat, Rect(0, cy, cx, cy));
Mat q2(magMat, Rect(cx, cy, cx, cy));
Mat q3(magMat, Rect(cx, 0, cx, cy));
Mat tmp;
q0.copyTo(tmp);
q2.copyTo(q0);
tmp.copyTo(q2);
q1.copyTo(tmp);
q3.copyTo(q1);
tmp.copyTo(q3);
//Normalize the magnitude to [0,1], then to[0,255]
normalize(magMat, magMat, 0, 1, CV_MINMAX);
Mat magImg(magMat.size(), CV_8UC1);
magMat.convertTo(magImg,CV_8UC1,255,0);
imshow("magnitude", magImg);
//imwrite("imageText_mag.jpg",magImg);
//Turn into binary image
threshold(magImg,magImg,GRAY_THRESH,255,CV_THRESH_BINARY);
imshow("mag_binary", magImg);
//imwrite("imageText_bin.jpg",magImg);
//Find lines with Hough Transformation
vector<Vec2f> lines;
float pi180 = (float)CV_PI/180;
Mat linImg(magImg.size(),CV_8UC3);
HoughLines(magImg,lines,1,pi180,HOUGH_VOTE,0,0);
int numLines = lines.size();
for(int l=0; l<numLines; l++)
{
float rho = lines[l][0], theta = lines[l][1];
Point pt1, pt2;
double a = cos(theta), b = sin(theta);
double x0 = a*rho, y0 = b*rho;
pt1.x = cvRound(x0 + 1000*(-b));
pt1.y = cvRound(y0 + 1000*(a));
pt2.x = cvRound(x0 - 1000*(-b));
pt2.y = cvRound(y0 - 1000*(a));
line(linImg,pt1,pt2,Scalar(255,0,0),3,8,0);
}
imshow("lines",linImg);
//imwrite("imageText_line.jpg",linImg);
if(lines.size() == 3){
cout << "found three angels:" << endl;
cout << lines[0][1]*180/CV_PI << endl << lines[1][1]*180/CV_PI << endl << lines[2][1]*180/CV_PI << endl << endl;
}
//Find the proper angel from the three found angels
float angel=0;
float piThresh = (float)CV_PI/90;
float pi2 = CV_PI/2;
for(int l=0; l<numLines; l++)
{
float theta = lines[l][1];
if(abs(theta) < piThresh || abs(theta-pi2) < piThresh)
continue;
else{
angel = theta;
break;
}
}
//Calculate the rotation angel
//The image has to be square,
//so that the rotation angel can be calculate right
angel = angel<pi2 ? angel : angel-CV_PI;
if(angel != pi2){
float angelT = srcImg.rows*tan(angel)/srcImg.cols;
angel = atan(angelT);
}
float angelD = angel*180/(float)CV_PI;
cout << "the rotation angel to be applied:" << endl << angelD << endl << endl;
//Rotate the image to recover
Mat rotMat = getRotationMatrix2D(center,angelD,1.0);
Mat dstImg = Mat::ones(srcImg.size(),CV_8UC3);
warpAffine(srcImg,dstImg,rotMat,srcImg.size(),1,0,Scalar(255,255,255));
imshow("result",dstImg);
//imwrite("imageText_D.jpg",dstImg);
waitKey(0);
return 0;
}
过程详解:
读取图片
Mat srcImg = imread(filename, CV_LOAD_IMAGE_GRAYSCALE);
if(srcImg.empty())
return -1;
srcImg.empty()用来判断是否成功读进图像,如果srcImg中没有数据,在后面的步骤会产生内存错误。
由于处理的是文本,彩色信息不会提供额外帮助,所以要用CV_LOAD_IMAGE_GRAYSCALE表明以灰度形式读进图像。
假定读取的图像如下:
旋转图像(可选)
#ifdef DEGREE
//Rotate source image
Mat rotMatS = getRotationMatrix2D(center, DEGREE, 1.0);
warpAffine(srcImg, srcImg, rotMatS, srcImg.size(), 1, 0, Scalar(255,255,255));
imshow("RotatedSrc", srcImg);
//imwrite("imageText_R.jpg",srcImg);
#endif
然后直接就是DFT部分以及霍夫直线检测部分的结合,这两部分就不多说了,你懂得,我前面博客有详细介绍
提及一下涉及到的两个参数
两个参数GRAY_THRESH和HOUGH_VOTE需要手动指定,不同的图像需要设置不同的参数,同一段文本旋转不同的角度也需要不同的参数。GRAY_THRESH越大,二值化的阈值就越高;HOUGH_VOTE越大,霍夫检测的投票数就越高(需要更多的共线点来确定一条直线)。说白了,如果发现二值化图像中直线附近有很多散点,就要适当提高GRAY_THRESH;如果发现从二值图像的一条直线上检测到了几条角度相差很小的直线,就需要适当提高HOUGH_VOTE。
检测到的结果:
dft
二值化
霍夫
计算倾斜角
上面得到了三个角度,一个是0度,一个是90度,另一个就是我们所需要的倾斜角。要把这个角找出来,而且要考虑误差。
//Find the proper angel from the three found angels
float angel=0;
float piThresh = (float)CV_PI/90;
float pi2 = CV_PI/2;
for(int l=0; l<numLines; l++)
{
float theta = lines[l][1];
if(abs(theta) < piThresh || abs(theta-pi2) < piThresh)
continue;
else{
angel = theta;
break;
}
}
//Calculate the rotation angel
//The image has to be square,
//so that the rotation angel can be calculate right
angel = angel<pi2 ? angel : angel-CV_PI;
if(angel != pi2){
float angelT = srcImg.rows*tan(angel)/srcImg.cols;
angel = atan(angelT);
}
float angelD = angel*180/(float)CV_PI;
cout << "the rotation angel to be applied:" << endl << angelD << endl << endl;
由于DFT的特点,只有输入图像是正方形时,检测到的角才是文本真正旋转的角度。但我们的输入图像不一定是正方形的,所以要根据图像的长宽比改变这个角度。(这个地方望高手给出解答)
还有一个需要注意的细节,虽然HoughLines()输出的倾斜角在[0,180)之间,但在[0,90]和(90,180)之间这个角的含义是不同的。请看图示:
当倾斜角大于90度时,(180-倾斜角)才是直线相对竖直方向的偏离角度。在OpenCV中,逆时针旋转,角度为正。要把图像转回去,
这个角度就变成了(倾斜角-180)。
倾斜角:
校正图像
Mat rotMat = getRotationMatrix2D(center,angelD,1.0);
Mat dstImg = Mat::ones(srcImg.size(),CV_8UC3);
warpAffine(srcImg,dstImg,rotMat,srcImg.size(),1,0,Scalar(255,255,255));
先用getRotationMatrix2D()获得一个2*3的仿射变换矩阵,再把这个矩阵输入warpAffine(),做一个单纯旋转的仿射变换。warpAffine()
的最后一个参数Scalar(255,255,255)是把由于旋转产生的空白用白色填充。
最后结果:
可以很明显的看出来清晰度下降了不少,感觉还是由能量损失(不知道啥原因)
至于其他一些文本的情况,大家可以自行检验
图片下载链接https://pan.baidu.com/s/1bnaNmmb
完整工程,包含图片下载链接:http://download.youkuaiyun.com/download/qq_37059483/9981626