深度/机器学习基础知识要点:CTC算法

本文深入解析Connectionist Temporal Classification(CTC)算法,探讨其在语音识别和手写字符识别中的应用,阐述输入输出对齐难题及解决方案,介绍CTC损失函数与预测原理,展示动态规划求解过程,并分析CTC特性与限制。
  • Connectionist Temporal Classification (CTC)

    CTC适合语音识别和手写字符识别任务

  • 定义
    输入表示:符号序列 X = [ x 1 , x 2 , . . . , x T ] X=[x_{1},x_{2},...,x_{T}] X=[x1,x2,...,xT]
    输出表示:符号序列 Y = [ y 1 , y 2 , . . . , y U ] Y=[y_{1},y_{2},...,y_{U}] Y=[y1,y2,...,yU]

    目标:找到输入X与输出Y之间精确的映射关系

    • 难点:

      1、X和Y都是变长的
      2、X和Y的长度比也是变化的
      3、X和Y相应的元素之间没有严格的对齐(即 x t 与 y u x_{t}与y_{u} xtyu不一定对齐)

  • 损失函数的定义
    对于给定的输入 X X X,我们训练模型希望最大化 Y Y Y的后验概率 P ( Y ∣ X ) , P ( Y ∣ X ) P(Y|X),P(Y|X) P(YX),P(YX)应该是可导的,这样我们就能利用梯度下降训练模型了。

  • 预测
    当我们已经训练好一个模型后,输入 X X X,我们希望输出 Y Y Y的条件概率最大,即

    Y ∗ = arg ⁡ max ⁡ Y p ( Y ∣ X ) Y*=\mathop{\arg\max}_{Y}p(Y|X) Y=

E:\anconda\envs\OCR\python.exe C:/Users/y-eut/Desktop/PaddleOCR-main/tools/train.py ��Ϣ: ���ṩ��ģʽ�޷��ҵ��ļ��� E:\anconda\envs\OCR\lib\site-packages\paddle\utils\cpp_extension\extension_utils.py:715: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message) [2025/08/08 16:13:39] ppocr WARNING: Skipping import of the encryption module. [2025/08/08 16:13:39] ppocr INFO: Architecture : [2025/08/08 16:13:39] ppocr INFO: Backbone : [2025/08/08 16:13:39] ppocr INFO: name : PPLCNetV3 [2025/08/08 16:13:39] ppocr INFO: scale : 0.5 [2025/08/08 16:13:39] ppocr INFO: Head : [2025/08/08 16:13:39] ppocr INFO: head_list : [2025/08/08 16:13:39] ppocr INFO: CTCHead : [2025/08/08 16:13:39] ppocr INFO: Head : [2025/08/08 16:13:39] ppocr INFO: fc_decay : 1e-05 [2025/08/08 16:13:39] ppocr INFO: Neck : [2025/08/08 16:13:39] ppocr INFO: depth : 2 [2025/08/08 16:13:39] ppocr INFO: dims : 120 [2025/08/08 16:13:39] ppocr INFO: hidden_dims : 120 [2025/08/08 16:13:39] ppocr INFO: kernel_size : [1, 3] [2025/08/08 16:13:39] ppocr INFO: name : svtr [2025/08/08 16:13:39] ppocr INFO: use_guide : True [2025/08/08 16:13:39] ppocr INFO: NRTRHead : [2025/08/08 16:13:39] ppocr INFO: max_text_length : 25 [2025/08/08 16:13:39] ppocr INFO: nrtr_dim : 384 [2025/08/08 16:13:39] ppocr INFO: name : MultiHead [2025/08/08 16:13:39] ppocr INFO: Transform : None [2025/08/08 16:13:39] ppocr INFO: algorithm : SVTR_LCNet [2025/08/08 16:13:39] ppocr INFO: model_type : rec [2025/08/08 16:13:39] ppocr INFO: Eval : [2025/08/08 16:13:39] ppocr INFO: dataset : [2025/08/08 16:13:39] ppocr INFO: data_dir : C:/Users/y-eut/Desktop/PaddleOCR-main/train_data/val [2025/08/08 16:13:39] ppocr INFO: label_file_list : ['C:/Users/y-eut/Desktop/PaddleOCR-main/train_data/val.txt'] [2025/08/08 16:13:39] ppocr INFO: name : SimpleDataSet [2025/08/08 16:13:39] ppocr INFO: transforms : [2025/08/08 16:13:39] ppocr INFO: DecodeImage : [2025/08/08 16:13:39] ppocr INFO: channel_first : False [2025/08/08 16:13:39] ppocr INFO: img_mode : BGR [2025/08/08 16:13:39] ppocr INFO: MultiLabelEncode : [2025/08/08 16:13:39] ppocr INFO: gtc_encode : NRTRLabelEncode [2025/08/08 16:13:39] ppocr INFO: RecResizeImg : [2025/08/08 16:13:39] ppocr INFO: image_shape : [3, 48, 320] [2025/08/08 16:13:39] ppocr INFO: KeepKeys : [2025/08/08 16:13:39] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio'] [2025/08/08 16:13:39] ppocr INFO: loader : [2025/08/08 16:13:39] ppocr INFO: batch_size_per_card : 4 [2025/08/08 16:13:39] ppocr INFO: drop_last : False [2025/08/08 16:13:39] ppocr INFO: num_workers : 1 [2025/08/08 16:13:39] ppocr INFO: shuffle : False [2025/08/08 16:13:39] ppocr INFO: Global : [2025/08/08 16:13:39] ppocr INFO: cal_metric_during_train : True [2025/08/08 16:13:39] ppocr INFO: character_dict_path : C:/Users/y-eut/Desktop/PaddleOCR-main/ppocr/utils/dict/airport.txt [2025/08/08 16:13:39] ppocr INFO: checkpoints : None [2025/08/08 16:13:39] ppocr INFO: d2s_train_image_shape : [3, 48, 320] [2025/08/08 16:13:39] ppocr INFO: debug : False [2025/08/08 16:13:39] ppocr INFO: distributed : False [2025/08/08 16:13:39] ppocr INFO: epoch_num : 100 [2025/08/08 16:13:39] ppocr INFO: eval_batch_step : [0, 2000] [2025/08/08 16:13:39] ppocr INFO: infer_img : C:/Users/y-eut/Desktop/PaddleOCR-main/train_data [2025/08/08 16:13:39] ppocr INFO: infer_mode : False [2025/08/08 16:13:39] ppocr INFO: log_smooth_window : 20 [2025/08/08 16:13:39] ppocr INFO: max_text_length : 25 [2025/08/08 16:13:39] ppocr INFO: model_name : PP-OCRv5_mobile_rec [2025/08/08 16:13:39] ppocr INFO: pretrained_model : None [2025/08/08 16:13:39] ppocr INFO: print_batch_step : 10 [2025/08/08 16:13:39] ppocr INFO: save_epoch_step : 10 [2025/08/08 16:13:39] ppocr INFO: save_inference_dir : None [2025/08/08 16:13:39] ppocr INFO: save_model_dir : ./output/PP-OCRv5_mobile_rec [2025/08/08 16:13:39] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv5.txt [2025/08/08 16:13:39] ppocr INFO: use_gpu : True [2025/08/08 16:13:39] ppocr INFO: use_space_char : True [2025/08/08 16:13:39] ppocr INFO: use_visualdl : False [2025/08/08 16:13:39] ppocr INFO: Loss : [2025/08/08 16:13:39] ppocr INFO: loss_config_list : [2025/08/08 16:13:39] ppocr INFO: CTCLoss : None [2025/08/08 16:13:39] ppocr INFO: NRTRLoss : None [2025/08/08 16:13:39] ppocr INFO: name : MultiLoss [2025/08/08 16:13:39] ppocr INFO: Metric : [2025/08/08 16:13:39] ppocr INFO: main_indicator : acc [2025/08/08 16:13:39] ppocr INFO: name : RecMetric [2025/08/08 16:13:39] ppocr INFO: Optimizer : [2025/08/08 16:13:39] ppocr INFO: beta1 : 0.9 [2025/08/08 16:13:39] ppocr INFO: beta2 : 0.999 [2025/08/08 16:13:39] ppocr INFO: lr : [2025/08/08 16:13:39] ppocr INFO: learning_rate : 0.0005 [2025/08/08 16:13:39] ppocr INFO: name : Cosine [2025/08/08 16:13:39] ppocr INFO: warmup_epoch : 5 [2025/08/08 16:13:39] ppocr INFO: name : Adam [2025/08/08 16:13:39] ppocr INFO: regularizer : [2025/08/08 16:13:39] ppocr INFO: factor : 3e-05 [2025/08/08 16:13:39] ppocr INFO: name : L2 [2025/08/08 16:13:39] ppocr INFO: PostProcess : [2025/08/08 16:13:39] ppocr INFO: name : CTCLabelDecode [2025/08/08 16:13:39] ppocr INFO: Train : [2025/08/08 16:13:39] ppocr INFO: dataset : [2025/08/08 16:13:39] ppocr INFO: data_dir : C:/Users/y-eut/Desktop/PaddleOCR-main/train_data/train [2025/08/08 16:13:39] ppocr INFO: label_file_list : ['C:/Users/y-eut/Desktop/PaddleOCR-main/train_data/train.txt'] [2025/08/08 16:13:39] ppocr INFO: name : SimpleDataSet [2025/08/08 16:13:39] ppocr INFO: transforms : [2025/08/08 16:13:39] ppocr INFO: DecodeImage : [2025/08/08 16:13:39] ppocr INFO: channel_first : False [2025/08/08 16:13:39] ppocr INFO: img_mode : BGR [2025/08/08 16:13:39] ppocr INFO: RecConAug : [2025/08/08 16:13:39] ppocr INFO: ext_data_num : 2 [2025/08/08 16:13:39] ppocr INFO: image_shape : [48, 320, 3] [2025/08/08 16:13:39] ppocr INFO: max_text_length : 25 [2025/08/08 16:13:39] ppocr INFO: prob : 0.5 [2025/08/08 16:13:39] ppocr INFO: MultiLabelEncode : [2025/08/08 16:13:39] ppocr INFO: gtc_encode : NRTRLabelEncode [2025/08/08 16:13:39] ppocr INFO: KeepKeys : [2025/08/08 16:13:39] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio'] [2025/08/08 16:13:39] ppocr INFO: loader : [2025/08/08 16:13:39] ppocr INFO: batch_size_per_card : 4 [2025/08/08 16:13:39] ppocr INFO: drop_last : True [2025/08/08 16:13:39] ppocr INFO: num_workers : 1 [2025/08/08 16:13:39] ppocr INFO: shuffle : True [2025/08/08 16:13:39] ppocr INFO: profiler_options : None [2025/08/08 16:13:39] ppocr INFO: train with paddle 3.1.0 and device Place(gpu:0) [2025/08/08 16:13:39] ppocr INFO: Initialize indexes of datasets:['C:/Users/y-eut/Desktop/PaddleOCR-main/train_data/train.txt'] [2025/08/08 16:13:39] ppocr ERROR: No Images in train dataset, please ensure 1. The images num in the train label_file_list should be larger than or equal with batch size. 2. The annotation file and path in the configuration file are provided normally. Process finished with exit code 0
08-09
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

szZack

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值