关于python验证码识别库,网上主要介绍的为pytesser及pytesseract
我安装的是pytesseract
目标:python验证码识别库安装
相关知识:要安装pytesseract库,必须先安装其依赖的PIL及tesseract-ocr,其中PIL为图像处理库,而后面的tesseract-ocr则为google的ocr识别引擎。
实际安装:
第一步:
pip install pillow
第二步:
pip install pytesseract
第三步:
下载并安装
tesseract-ocr-setup-3.02.02.exe
第四步:
找到 C:\Python27\Lib\site-packages\pytesseract\pytesseract.py
然后再找到下面的内容:
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'tesseract'
修改成下面这样
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
#tesseract_cmd = 'tesseract'
tesseract_cmd = 'D:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
第五步验证是否安装环境成功:
#!/usr/bin/env python
#coding=utf-8
try:
import pytesseract
from PIL import Image
except ImportError:
print '模块导入错误,请使用pip安装,pytesseract依赖以下库:'
raise SystemExit
image = Image.open('2.jpg')
vcode = pytesseract.image_to_string(image)
print vcode
环境已经成功,但是识别失败,这这