Teseeract ORC 是一款开源的ORC识别库。备注下识别语言编码:简体中文是chi_sim。Tesseract uses 3-character ISO 639-2 language codes。
如下从其gitHub摘抄的:地址:https://github.com/tesseract-ocr/tesseract/blob/a75ab450a8cc9a2b69cf05f5c4f7a39bc44cbacc/doc/tesseract.1.asc
=======================
afr (Afrikaans) amh (Amharic) ara (Arabic) asm (Assamese) aze (Azerbaijani) aze_cyrl (Azerbaijani - Cyrilic) bel(Belarusian) ben (Bengali) bod (Tibetan) bos (Bosnian) bul (Bulgarian) cat (Cat