今天测试了一下 pytesseract , 用来识别验证码, 结果出了点小问题, 记录如下.
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
img = Image.open('a.jpg','r')
print(pytesseract.image_to_string(img))
执行之后, 报错
Traceback (most recent call last):
File "D:\tests\test_pytesseract.py", line 30, in <module>
print(pytesseract.image_to_string(img))
File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string
config=config)
File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 95, in run_tesseract
stderr=subprocess.PIPE,
File "C:\Python27\lib\subprocess.py", line 702, in __init__
errread, errwrite), to_close = self._get_handles(stdin, stdout, stderr)
File "C:\Python27\lib\subprocess.py", line 850, in _get_handles
c2pwrite = self._make_inheritable(c2pwrite)
File "C:\Python27\lib\subprocess.py", line 884, in _make_inheritable
_subprocess.DUPLICATE_SAME_ACCESS)
WindowsError: [Error 6]
最后发现是subprocess的问题, 讲pytesseract的文件修改一下即可.
def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
'''
runs the command:
`tesseract_cmd` `input_filename` `output_filename_base`
returns the exit status of tesseract, as well as tesseract's stderr output
'''
command = [tesseract_cmd, input_filename, output_filename_base]
if lang is not None:
command += ['-l', lang]
if boxes:
command += ['batch.nochop', 'makebox']
if config:
command += shlex.split(config)
proc = subprocess.Popen(command,stderr=subprocess.PIPE)
return (proc.wait(), proc.stderr.read())
修改为:
def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
'''
runs the command:
`tesseract_cmd` `input_filename` `output_filename_base`
returns the exit status of tesseract, as well as tesseract's stderr output
'''
command = [tesseract_cmd, input_filename, output_filename_base]
if lang is not None:
command += ['-l', lang]
if boxes:
command += ['batch.nochop', 'makebox']
if config:
command += shlex.split(config)
proc = subprocess.Popen(command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True
)
return (proc.wait(), proc.stderr.read())
即可.