OCR论文复现:Multi-Granularity Prediction for Scene Text Recognition

AI助手已提取文章相关产品:

论文复现:Multi-Granularity Prediction for Scene Text Recognition

在这里插入图片描述

  • 论文解读[ECCV2022] MGP-STR:一种基于视觉Transformer的多粒度文字识别方法(已开源)
    • 论文:https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880336.pdf
    • 参考repo:https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR
    • 验收标准:基于MJ和ST数据集训练,在IC13 SVT IIIT IC15 SVTP CUTE上评估平均精度达到93.35%(论文中Table 6)。
  • 提交内容:
    • 代码、模型、训练日志
    • 提交代码和中英文文档PR到PaddleOCR,参考

Readme内容:

Multi-Granularity Prediction for Scene Text Recognition

The official PyTorch implementation of MGP-STR (ECCV 2022).

MGP-STR is a conceptually SIMPLE yet POWERFUL vision STR model, which is built upon Vision Transformer (ViT). To integrate linguistic knowledge, Multi-Granularity Prediction (MGP) strategy is proposed to inject information from the language modality into the model in an implicit way. With NO independent language model (LM), MGP-STR outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods.

Paper

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zLh8ibRi-1681961851763)(null)]

Install requirements

  • This work was tested with PyTorch 1.7.0, CUDA 10.1, python 3.6 and Ubuntu 16.04.
pip3 install -r requirements.txt

Dataset

Download lmdb dataset from Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition.

data
├── evaluation
│   ├── CUTE80
│   ├── IC13_857
│   ├── IC15_1811
│   ├── IIIT5k_3000
│   ├── SVT
│   └── SVTP
├── training
│   ├── MJ
│   │   ├── MJ_test
│   │   ├── MJ_train
│   │   └── MJ_valid
│   └── ST

At this time, training datasets and evaluation datasets are LMDB datasets

Pretrained Models

Available model weights:

TinySmallBase
MGP-STR-TinyMGP-STR-SmallMGP-STR-Base

Benchmarks (Top 1% accuracy)

Performances of the reproduced pretrained models are summaried as follows:

ModelOutputIC13SVTIIITIC15SVTPCUTEAVG
MGP-STR-tinyChar94.691.294.182.784.781.989.7
BPE86.386.483.673.280.070.180.7
WP53.743.156.852.039.244.151.9
Fuse95.392.194.383.185.981.690.2
MGP-STR-smallChar95.891.895.084.986.787.591.2
BPE97.094.088.880.587.484.087.8
WP79.576.477.070.272.764.974.7
Fuse96.693.295.186.488.188.592.0
MGP-STR-baseChar96.393.095.986.087.488.592.2
BPE97.195.190.082.189.984.089.1
WP97.894.689.181.690.481.688.6
Fuse97.694.996.287.990.289.293.4

Run demo with pretrained model

  1. Download pretrained model
  2. Add image files to test into demo_imgs/
  3. Run demo.py
mkdir demo_imgs/attens
CUDA_VISIBLE_DEVICES=0 python3 demo.py --Transformer mgp-str \
--TransformerModel=mgp_str_base_patch4_3_32_128 --model_dir mgp_str_base.pth --demo_imgs demo_imgs/

Train

MGP-STR-base

CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --master_port 29501 train_final_dist.py --train_data data/training \
--valid_data data/evaluation  --select_data MJ-ST  \
--batch_ratio 0.5-0.5  --Transformer mgp-str \
--TransformerModel=mgp_str_base_patch4_3_32_128 --imgH 32 --imgW 128 \
--manualSeed=226 --workers=12 --isrand_aug --scheduler --batch_size=100 --rgb \
--saved_path <path/to/save/dir> --exp_name mgp_str_patch4_3_32_128 --valInterval 5000 --num_iter 2000000 --lr 1

Multi-GPU training

MGP-STR-base on a 2-GPU machine

It is recommended to train larger networks like MGP-STR-Small and MGP-STR-Base on a multi-GPU machine. To keep a fixed batch size at 100, use the --batch_size option. Divide 100 by the number of GPUs. For example, to train MGP-STR-Small on a 2-GPU machine, this would be --batch_size=50.

CUDA_VISIBLE_DEVICES=0,1 python3 -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --master_port 29501 train_final_dist.py --train_data data/training \
--valid_data data/evaluation  --select_data MJ-ST  \
--batch_ratio 0.5-0.5  --Transformer mgp-str \
--TransformerModel=mgp_str_base_patch4_3_32_128 --imgH 32 --imgW 128 \
--manualSeed=226 --workers=12 --isrand_aug --scheduler --batch_size=50 --rgb \
--saved_path <path/to/save/dir> --exp_name mgp_str_patch4_3_32_128 --valInterval 5000 --num_iter 2000000 --lr 1

Test

Find the path to best_accuracy.pth checkpoint file (usually in saved_path folder).

CUDA_VISIBLE_DEVICES=0 python3 test_final.py --eval_data data/evaluation --benchmark_all_eval --Transformer mgp-str  --data_filtering_off --rgb --fast_acc --TransformerModel=mgp_str_base_patch4_3_32_128 --model_dir <path_to/best_accuracy.pth>

Visualization

The illustration of spatial attention masks on Character A3 module, BPE A3 module and WordPiece A3 module, respectively.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4k0KHHdY-1681961851815)(null)]

Acknowledgements

This implementation has been based on these repository ViTSTR, CLOVA AI Deep Text Recognition Benchmark, TokenLearner.

Citation

If you find this work useful, please cite:

@inproceedings{ECCV2022mgp_str,
  title={Multi-Granularity Prediction for Scene Text Recognition},
  author={Peng Wang, Cheng Da, and Cong Yao},
  booktitle = {ECCV},
  year={2022}
}

License

MGP-STR is released under the terms of the [Apache License, Version 2.0](file:///D:/Python/PycharmProjects/OCR/MGP-STR/LICENSE).

MGP-STR is an algorithm for scene text recognition and the code and models herein created by the authors from Alibaba can only be used for research purpose.
Copyright (C) 1999-2022 Alibaba Group Holding Ltd. 

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

您可能感兴趣的与本文相关内容

### Multi-granularity Auto-correlation Attention 机制详解 Multi-granularity auto-correlation attention 是一种用于捕捉数据在不同粒度下自相关性特征的注意力机制。该机制的核心思想是通过多尺度分析,从不同时间或空间窗口中提取特征,并利用注意力机制对这些特征进行加权聚合,从而增强模型对关键信息的捕获能力[^1]。 #### 1. 多粒度(Multi-granularity)的概念 多粒度指的是在不同的时间或空间尺度上对输入信号进行分析。例如,在时间序列数据中,可以将信号分割为短窗口、中窗口和长窗口,分别提取局部和全局的特征。这种多尺度分析有助于模型同时捕捉短期依赖和长期依赖关系[^2]。 #### 2. 自相关性(Auto-correlation) 自相关性是指一个信号与其自身在不同时间延迟下的相似程度。在 Multi-granularity auto-correlation attention 中,自相关性被用作衡量不同时间步之间关联强度的一种方法。通过计算输入序列与自身的内积或其他相似性度量,可以获得自相关矩阵,该矩阵反映了序列中不同位置之间的相互关系[^3]。 #### 3. 注意力机制(Attention Mechanism) 注意力机制允许模型根据上下文动态地分配权重,以突出重要特征并抑制不相关信息。在 Multi-granularity auto-correlation attention 中,注意力机制通过对自相关矩阵施加权重,进一步强化了对关键自相关模式的关注。具体实现通常包括以下步骤: - **特征提取**:将输入序列映射到多个尺度的特征表示。 - **自相关计算**:基于每个尺度的特征,计算自相关矩阵。 - **权重分配**:通过注意力机制对不同尺度的自相关矩阵进行加权聚合。 - **输出生成**:将加权后的特征重新组合,生成最终的输出表示。 #### 4. 具体实现 以下是一个简化的代码示例,展示了如何实现 Multi-granularity auto-correlation attention 的核心部分: ```python import torch import torch.nn as nn class MultiGranularityAttention(nn.Module): def __init__(self, input_dim, num_granularities): super(MultiGranularityAttention, self).__init__() self.num_granularities = num_granularities self.projections = nn.ModuleList([nn.Linear(input_dim, input_dim) for _ in range(num_granularities)]) self.attention_weights = nn.Parameter(torch.randn(num_granularities)) def forward(self, x): # 计算多粒度特征 correlations = [] for i in range(self.num_granularities): projected_x = self.projections[i](x) correlation = torch.matmul(projected_x, projected_x.transpose(-1, -2)) correlations.append(correlation) # 对不同粒度的自相关矩阵进行加权聚合 weighted_correlations = sum(w * c for w, c in zip(self.attention_weights, correlations)) return weighted_correlations ``` 上述代码定义了一个 `MultiGranularityAttention` 模块,其中包含多个线性投影层,用于生成不同粒度的特征表示。通过计算自相关矩阵并施加注意力权重,实现了对多粒度特征的融合[^4]。 ### 总结 Multi-granularity auto-correlation attention 是一种强大的机制,能够有效捕捉数据在不同尺度下的自相关性特征。其结合了多尺度分析和注意力机制的优点,广泛应用于时间序列预测、网络流量分析等领域[^5]。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值