Webdriver配合Tesseract-OCR 自动识别简单的验证码

本文介绍了一种利用Tesseract-OCR工具自动识别网页上的验证码的方法。主要步骤包括:通过浏览器自动化工具定位并截图验证码图片,使用Tesseract-OCR识别验证码字符,并读取识别结果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在进行自动化测试,遇到验证码的问题,一般有两种方式 :
1.找开发去掉验证码或者使用万能验证码
2.使用OCR自动识别
使用OCR自动化识别,一般识别率不是太高,处理一般简单验证码还是没问题
这里使用的是Tesseract-OCR,下载地址:https://github.com/A9T9/Free-Ocr-Windows-Desktop/releases
怎么使用呢?
进入安装后的目录:
tesseract.exe test.png test -1
这里写图片描述

准备一份网页,上面使用该验证码

<html>
<head>
<title>Table test by Young</title>
</head>
<body>
 </br>
<h1> Test </h1>
 <img src="http://csujwc.its.csu.edu.cn/sys/ValidateCode.aspx?t=1">
 </br>
</body>
</html>

要识别验证码,首先得取得验证码,这两款采取对 页面元素部分截图的方式,首先获取整个页面的截图,然后找到页面元素坐标进行截取。

/**
     * This method for screen shot element
     * 
     * @param driver
     * @param element
     * @param path
     * @throws InterruptedException
     */
    public static void screenShotForElement(WebDriver driver,
            WebElement element, String path) throws InterruptedException {
        File scrFile = ((TakesScreenshot) driver)
                .getScreenshotAs(OutputType.FILE);
        try {
            Point p = element.getLocation();
            int width = element.getSize().getWidth();
            int height = element.getSize().getHeight();
            Rectangle rect = new Rectangle(width, height);
            BufferedImage img = ImageIO.read(scrFile);
            BufferedImage dest = img.getSubimage(p.getX(), p.getY(),
                    rect.width, rect.height);
            ImageIO.write(dest, "png", scrFile);
            Thread.sleep(1000);
            FileUtils.copyFile(scrFile, new File(path));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

截取完元素,就可以调用Tesseract-OCR生成text

// use Tesseract to get strings
        Runtime rt = Runtime.getRuntime();
        rt.exec("cmd.exe /C  tesseract.exe D:\\Tesseract-OCR\\test.png  D:\\Tesseract-OCR\\test -1 ");

接下来通过java读取txt

/**
     * This method for read TXT file
     * 
     * @param filePath
     */
    public static void readTextFile(String filePath) {
        try {
            String encoding = "GBK";
            File file = new File(filePath);
            if (file.isFile() && file.exists()) { // 判断文件是否存在
                InputStreamReader read = new InputStreamReader(
                        new FileInputStream(file), encoding);// 考虑到编码格式
                BufferedReader bufferedReader = new BufferedReader(read);
                String lineTxt = null;
                while ((lineTxt = bufferedReader.readLine()) != null) {
                    System.out.println(lineTxt);
                }
                read.close();
            } else {
                System.out.println("找不到指定的文件");
            }
        } catch (Exception e) {
            System.out.println("读取文件内容出错");
            e.printStackTrace();
        }
    }

整体代码如下:

package com.dbyl.tests;

import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.concurrent.TimeUnit;

import javax.imageio.ImageIO;

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.By;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.Point;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;

import com.dbyl.libarary.utils.DriverFactory;

public class TesseractTest {

    public static void main(String[] args) throws IOException,
            InterruptedException {

        WebDriver driver = DriverFactory.getChromeDriver();
        driver.get("file:///C:/Users/validation.html");
        driver.manage().timeouts().pageLoadTimeout(30, TimeUnit.SECONDS);
        WebElement element = driver.findElement(By.xpath("//img"));

        // take screen shot for element
        screenShotForElement(driver, element, "D:\\Tesseract-OCR\\test.png");

        driver.quit();

        // use Tesseract to get strings
        Runtime rt = Runtime.getRuntime();
        rt.exec("cmd.exe /C  tesseract.exe D:\\Tesseract-OCR\\test.png  D:\\Tesseract-OCR\\test -1 ");

        Thread.sleep(1000);
        // Read text
        readTextFile("D:\\Tesseract-OCR\\test.txt");
    }

    /**
     * This method for read TXT file
     * 
     * @param filePath
     */
    public static void readTextFile(String filePath) {
        try {
            String encoding = "GBK";
            File file = new File(filePath);
            if (file.isFile() && file.exists()) { // 判断文件是否存在
                InputStreamReader read = new InputStreamReader(
                        new FileInputStream(file), encoding);// 考虑到编码格式
                BufferedReader bufferedReader = new BufferedReader(read);
                String lineTxt = null;
                while ((lineTxt = bufferedReader.readLine()) != null) {
                    System.out.println(lineTxt);
                }
                read.close();
            } else {
                System.out.println("找不到指定的文件");
            }
        } catch (Exception e) {
            System.out.println("读取文件内容出错");
            e.printStackTrace();
        }
    }

    /**
     * This method for screen shot element
     * 
     * @param driver
     * @param element
     * @param path
     * @throws InterruptedException
     */
    public static void screenShotForElement(WebDriver driver,
            WebElement element, String path) throws InterruptedException {
        File scrFile = ((TakesScreenshot) driver)
                .getScreenshotAs(OutputType.FILE);
        try {
            Point p = element.getLocation();
            int width = element.getSize().getWidth();
            int height = element.getSize().getHeight();
            Rectangle rect = new Rectangle(width, height);
            BufferedImage img = ImageIO.read(scrFile);
            BufferedImage dest = img.getSubimage(p.getX(), p.getY(),
                    rect.width, rect.height);
            ImageIO.write(dest, "png", scrFile);
            Thread.sleep(1000);
            FileUtils.copyFile(scrFile, new File(path));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

原文:http://www.cnblogs.com/tobecrazy/p/4691045.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值