VIT(Vision Transformer)系列论文汇总

        最近几年,Vision Transformer以及各种修改版本的模型结构已经广泛应用在各种CV任务当中。虽然transformer有比较强的全局特征提取能力,但是没有偏置(局部特征提取能力受限),计算量大,耗时(和分辨率的平方成正比的计算复杂度)等。之后,大家针对transformer存在的这些问题,进行不断的改良,出现了各种各样的优化结构以及论文。笔者认为,想学好transformer在CV领域的应用,并且最终能够用到自己的工作或项目当中,甚至提出新的网络结构,应该要全面地先对transformer的优缺点有充足的了解以及理解;并且全面了解其发展,以及每个时期的不同transformer为基础的网络结构的变化,改进方法,相互之间的联系。通过大量阅读相关的论文,以及代码,来建立起一个相对完整的知识体系。下面为各位读者朋友整理出来的最近3-5年的一些CV领域transformer相关的论文,如下所列。之后有机会也会针对一些论文分享一些相关的总结。

 1. EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

 2. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

 3. Metaformer is actually what you need for vision

 4. Mvitv2: Improved multiscale vision transformers for classification and detection

 5. Multiscale vision transformers

 6. Pvt v2: Improved baselines with pyramid vision transformer

 7. Soft: Softmax-free transformer with linear complexity

 8. Sima: Simple softmax-free attention for vision transformers

 9. Flowformer: A transformer architecture for optical flow

 10. Hydra Attention: Efficient Attention with Many Heads

 11. Transformers are rnns: Fast autoregressive transformers with linear attention

 12. Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization

 13. EcoFormer: Energy-Saving Attention with Linear Complexity

  14. Castling-ViT: Compressing SelfAttention via Switching Towards Linear-Angular Attention at  Vision Transformer Inference

  15. Cmt: Convolutional neural networks meet vision transformers

  16. Rethinking spatial dimensions of vision transformers

  17. Focal self-attention for local-global interactions in vision transformers

   18. Quadtree attention for vision transformers

   19. Scalable vision transformers with hierarchical pooling

    20. Cross-attention multi-scale vision transformer for image classification

    21. Co-scale conv-attentional image transformers

    22. Fastervit: Fast vision transformers with hierarchical attention

    23. Efficientvit: Memory efficient vision transformer with cascaded group attention

    24. Castling-vit: Compressing self-attention via switching towards linear-angular attention during vision transformer

    25. Tokens-to-token vit: Training vision transformers from scratch on imagenet

    26. Levit: a vision transformer in convnet’s clothing for faster inference

    27. Mobilevit: light-weight, generalpurpose, and mobile-friendly vision transformer

    28. Localvit: Bringing locality to vision transformers

    29. Twins: Revisiting the design of spatial attention in vision transformers

    30. Regionvit: Regional-to-local attention for vision transformers 

    31. Kvt: k-nn attention for boosting vision transformers

    32. Fast vision transformers with hilo attention

    33. Cswin transformer: A general vision transformer backbone with cross-shaped windows

    34. Flatten transformer: Vision transformer using focused linear attention

    35. Shuffle transformer: Rethinking spatial shuffle for vision transformer

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

机械系的AI小白

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值