Transformer在CV
谷歌的 ViT 方法通常将一幅 224×224 的图片打散成 196 个 16×16 的图片块(patch),依次对其做线性编码,从而得到一个输入序列(input sequence),使 Transformer 可以像处理字符序列一样处理图片。同时,为了保留各个图片块之间的位置信息,加入了和输入序列编码维度等长的位置编码。DeiT 提高了 ViT 的训练效率,不再需要把大数据集(如 JFT-300M)作为预训练的限制,Transformer 可以直接在 ImageNet 上训练。
论文搜集:
2021
-
《An Image Is Worth 16X16 Words: Transformers for Image Recognition at Scale》
ICLR 2021 under review -
DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION
2020
- End-to-end object detection with transformers
ECCV2020 - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
预印本2020 - Do We Really Need Explicit Position Encodings for Vision Transformers?
美团-预印本 - Training data-efficient image transformers
& distillation through attention
FaceBook-预印本 - On the relationship between self-attention and convolutional layers
ICLR, 2020 - Feature Pyramid Transformer(CVPR)
- Learning Texture Transformer Network for Image Super-Resolution(CVPR)
2019
- Attention augmented convolutional networks.
ICCV, 2019.
其他
清华实验室搜集论文集合:
清华AMiner 搜集的 Transformer 在cv上的应用
一篇关于 Visual Transformer的综述:
A survey on visual Transformer
Blog
- 《An Image Is Worth 16X16 Words: Transformers for Image Recognition at Scale》知乎
- 文章二
- 计算机视觉领域中的Transformers
目标检测:https://arxiv.org/pdf/2005.12872.pdf
视频分类:https://arxiv.org/pdf/1711.07971.pdf
图像分类:https://arxiv.org/pdf/1802.05751.pdf
图像生成:https://arxiv.org/pdf/2010.11929.pdf - Google开发者对ViT的介绍
如果小伙伴想具体了解这一领域最近的动态,可以查看这篇由 Gbriel | lharco 撰写的推文:
https://arxiv.org/pdf/1911.03584.pdf
参考资料
图像分类
最强总结
paperwithcode-关于图像分类
包含各数据集的最好表现以及模型所依赖的文章。
论文搜集
2021
2020
-
Towards robust image classification using sequential attention models
CVPR-2020 -
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
ICLR-2020 -
Spatially Attentive Output Layer for Image Classifification
CVPR2020
6.Self-training with Noisy Student improves ImageNet classification
CVPR -
Image Matching across Wide Baselines: From Paper to PracticeCVPR
-
Making Better Mistakes: Leveraging Class Hierarchies with Deep NetworksCVPR
-
Generative Pretraining from Pixels ICLR2020-iGPT-图像上的GPT
2019
- Regularized evolution for image classifier architecture search
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks-ICLR2019-现在imagenet的sota
(知乎介绍)
2019前
图像分类经典论文汇总(AlexNet - VGG - Resnet …)
其他
图像分类最新综述2020:
A survey on semi-, self-and unsupervised learning for image classification
数据集
Blog
图像生成
生成对抗网络
最强总结
涵盖18+ SOTA GAN实现,这个图像生成领域的PyTorch库火了
其他:
Rob-gan: Generator, discriminator, and adversarial attacker
AugGAN: Cross Domain Adaptation with GAN-based Data Augmentation