Bert预训练
Bert 预训练
VilBERT,LXMERT,VisualBERT,Unicoder-VL,VL-BERT,ImageBERT
TextVQA
- M4C | Paper | Code | 笔记
- SA-M4C | Paper | 笔记
- SMA | Paper | 笔记
- MM-GNN | Paper
- LoRRA | Paper | Code
- QA R-CNN | Paper
- Simple is not Easy | Paper | Code | 笔记
文档理解
文档理解
LayoutLMFT、StructuralLM
文档大模型
-
UDOP
Unifying Vision, Text, and Layout for Universal Document Processing
收录:CVPR2023
论文:https://arxiv.org/abs/2212.02623
代码:https://github.com/microsoft/i-Code/tree/main/i-Code-Doc
解读:https://blog.youkuaiyun.com/m0_38007695/article/details/130218532?spm=1001.2014.3001.5501 -
FlexDM
Towards Flexible Multi-modal Document Models
收录:CVPR2023
论文:https://arxiv.org/abs/2303.18248
代码:https://cyberagentailab.github.io/flex-dm -
GeoLayoutLM
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
收录:CVPR2023
论文:https://arxiv.org/abs/2304.10759
代码:https://github.com/AlibabaResearch/AdvancedLiterateMachinery