验证码识别level---01

最新推荐文章于 2025-11-26 15:49:07 发布

原创最新推荐文章于 2025-11-26 15:49:07 发布 · 146 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #java

本文介绍如何使用Java处理验证码图像，通过黑白化处理去除背景干扰，然后定位并提取数字部分。详细解释了如何利用RGB值确定背景与数字区域，以及后续将图像分割、与标准库匹配以识别数字的过程。

对图像识别这方面比较有兴趣，所以最近搜集了一点关于图片识别的资料来看，最后决定先行验证码识别开始，这是最简单的一步，从验证码识别来开启整个图像识别的路。

这个验证码识别，现在我只是刚刚开始接触，所以现在只是用java写几个函数简单的把噪点干扰去掉，今天开始最简单的验证码识别，想下面的图片

这是最简单的验证码，图像干扰少，且每个数字位置固定，那么要提取出其中数字是今天最主要要做的，这也算是关键的一步，要提取其中的图片，那就要用到图片的RGB数值，今天简单的看了一点计算机图形学，了解了一点显示器的原理，现在我们就是要把每张验证码黑白化，把数字部分用黑色填充，其余用白色填充，这个验证码也是比较容易改的，因为背景是浅颜色的，所以我们要把背景去掉就好，那么就要知道，当RGB三要素的和越大的时候颜色越浅，反之则越深，小伙伴们可以上网搜一下RGB表，在这里就不好多赘述了，现在开始移除背景，代码如下：

	//把验证码的浅颜色区域变白
	public static int isWhite(int colorInt){
		//这里的colorInt是代表一个颜色的参数，用这个Int型参数就可以实现一个颜色对象，小伙伴们可以查API
		Color color = new Color(colorInt);
		if(color.getRed()+color.getGreen()+color.getBlue()>=100){
			return 1;
			
		}
		return 0;
	}
	//把验证码的深颜色区域变黑
	public static int isBlack(int colorInt){
		Color color = new Color(colorInt);
		if(color.getRed()+color.getGreen()+color.getBlue() < 100){
			return 1;
			
		}
		return 0;
	}
	//黑白化图片
	public static BufferedImage removeBackground(String path) throws IOException{
		//把图片读入缓冲图中处理
		BufferedImage img = ImageIO.read(new File(path));
		//计算像素点数目，对每一个像素点处理
		int imageWidth = img.getWidth();
		int imgHight = img.getHeight();
		for(int i = 0;i<imageWidth;++i){
			for(int j= 0;j<imgHight;++j){
				if(isWhite(img.getRGB(i, j)) == 1){
					img.setRGB(i, j, Color.white.getRGB());
				}
				else if(isBlack(imgHight) == 1){
					img.setRGB(i, j, Color.BLACK.getRGB());
				}
			}
		}
		return img;
	}

做完这一步以后，我们就要读入图片了，先要把图片存在一个File类型的数组里面，然后写到缓冲图里面一个一个的处理，在缓冲图里黑白化处理以后，就要把每个相应的数字切割出来，然后跟自己的标准数字库比较，像素点相同的最多的就是该数字。

	//用一个哈希map存储图片和名称，并返回,BufferedImage 缓冲图
	public static Map<BufferedImage,String> loadImageData() throws IOException{
		Map<BufferedImage,String> map = new HashMap<BufferedImage,String>();
		//我是把图片文件都存在本地的一个img文件夹下
		File dir = new File("img");
		File[] files = dir.listFiles();
		for(File file : files){
			//读取图片，存入map中
			map.put(ImageIO.read(file), file.getName().charAt(0)+"");
		}
		return map;
	}
	//得到验证码中图片单个的数字，可以用于跟自己的标准库比对
	public static String getSingleCharOcr(BufferedImage img,Map<BufferedImage,String> map){
		String result = "";
		//得到改图片的像素点数
		int width = img.getWidth();
		int heigth = img.getHeight();
		int min = width * heigth;
		//对每一张图片进行分解配对
		for(BufferedImage bi : map.keySet()){
			int count = 0;
			//对图片中的每一个数字进行匹配
			Label1: for(int x = 0;x<width;++x){
				for(int y = 0;y<heigth;++y){
					if(isWhite(img.getRGB(x, y))== 1){
						count++;
					}
					if(count >= min){
						break Label1;
					}
				}
			}
			if(count <min ){
				min = count;
				result = map.get(bi);
			}
		}
		
		return result;
		
	}
	//得到所有的匹配结果，并生成图片，用匹配结果命名
	public static String getAllOcr(String path) throws IOException {
		BufferedImage img = removeBackground(path);
		List<BufferedImage> listImg = splitImage(img);
		Map<BufferedImage,String> map = loadImageData();
		String result = "";
		for(BufferedImage bi : listImg){
			result += getSingleCharOcr(bi,map);
			ImageIO.write(bi, "JPG", new File("output/"+result+"and"+".jpg"));
			
		}
		ImageIO.write(img, "JPG", new File("result/"+result+".jpg"));
		
		return result;
	}

这里面的标准库就是1--9所有单个数字的图片，下面是截图：

用每一个从验证码中截取出来的单个数字进行匹配就可以得到结果，如果截取函数位置设的恰当，辨别率理应达到100%：

//黑白化分割图片，坐标好像有些不精准，小伙伴们可以改一下
	public static List<BufferedImage> splitImage(BufferedImage image){
		List<BufferedImage> subImg = new ArrayList<BufferedImage> ();
		subImg.add(image.getSubimage(7, 0, 11, 20));
		subImg.add(image.getSubimage(18,0, 11, 20));
		subImg.add(image.getSubimage(30,0, 11,20));
		subImg.add(image.getSubimage(35,0, 11,20));
		return subImg;
	}