[CVPR 2022]A ConvNet for the 2020s

夏莉莉iy

于 2025-01-03 18:33:09 发布

阅读量1k

点赞数 21

CC 4.0 BY-SA版权

分类专栏：论文精读文章标签：深度学习人工智能计算机视觉目标检测神经网络分类机器学习

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/Sherlily/article/details/144898797

论文精读专栏收录该内容

190 篇文章

订阅专栏

论文网址：A ConvNet for the 2020s | IEEE Conference Publication | IEEE Xplore

论文代码：GitHub - facebookresearch/ConvNeXt: Code release for ConvNeXt model

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

目录

2. 论文逐段精读

2.2. Introduction

2.3. Modernizing a ConvNet: A Roadmap

2.3.1. Training Techniques

2.3.2. Macro Design

2.3.3. ResNeXt-Ify

2.3.4. Inverted Bottleneck

2.3.5. Large Kernel Sizes

2.3.6. Micro Design

2.4. Empirical Evaluations on ImageNet

2.4.2. Isotropic ConvNeXt vs. ViT

2.5. Empirical Evaluation on Downstream Tasks

2.6. Related Work

2.7. Conclusions

3. 知识补充

3.1. Inductive bias

1. 心得

（1）这论文标题顶上空那么多干啥，给我都拿来写了

（2）凌晨四点新开一篇论文！如果世界上没有ddl该多好，可以无忧无虑地看论文

（3）这东西居然这么新！！大为震惊，CNN在22年还能创新的吗。看了眼作者机构基本都是facebook的，打扰了。为什么我总是在打扰。可能我实在太卑微了

（4）该说不说这种文章英文读起来会舒服很多，感觉更有文学性和可读性，推荐阅读英文，会有不同的感受。不过我感觉很多也因为这些作者普遍会自信一些（因为就是很厉害，不会很胆怯），因此在学术论文的表达上也很大胆，不会非常枯燥

（5）不是，怎么看到后面感觉像个调参的

2. 论文逐段精读

2.1. Abstract

①They aim to explore the possibility of pure ConvNet

2.2. Introduction

①The biggest challenge for ViT is the quadratic complexity with respect to the input size

②They aim to identify how do design decisions in Transformers impact ConvNets' performance?

precipitate vt.加速(坏事的发生)；使突然陷入(某种状态)；使…突然降临 adj.仓促的；鲁莽的；草率的 n.沉淀物；析出物

odyssey n.漫长而充满风险的历程；艰苦的跋涉

2.3. Modernizing a ConvNet: A Roadmap

①They apply ResNet-50 / Swin-T with 4.5e9 FLOPs to present results

②Performance of ConvNeXt on ImageNet under different design:

③Comparison diagram on ImageNet:

2.3.1. Training Techniques

①Training techniques such as optimizer changing will actually enhance the performance of CNN

2.3.2. Macro Design

①Adjusting the block numbers each stage of ResNet-50 from (3, 4, 6, 3) to (3, 3, 9, 3) for aligning FLOPs with Swin-T

②They changed kernels in ResNet from 7 * 7 with stride 2 to 4 * 4 with stride 4

2.3.3. ResNeXt-Ify

①Adding channels from 64 to 96 (same as Swin-T), enhancing accuracy and increasing 5.3 GFLOPs

2.3.4. Inverted Bottleneck

①Bottleneck design:

where (a) is ResNeXt block, (b) is their inverted bottleneck,(c) is inverted bottleneck with block position changing

②Inverted bottleneck design decreases to 4.6 GFLOPs

2.3.5. Large Kernel Sizes

①Increase kernel size to 7*7

②From (b) to (c), the GFLOPs decrease to 4.1 GFLPs

2.3.6. Micro Design

①Replace ReLU by GELU

②Remove 2 GELU to get less activation function:

③Reduce BatchNorm (BN) layers and replace one by Layer Normalization (LN)

④Spatial downsampling by residual 2*2 block

⑤The FLOPs, #params., throughput, and memory use of Swin Transformer and ConvNeXt are similar, but ConvNeXt does not need shifted window attention or relative position biases

2.4. Empirical Evaluations on ImageNet

①All the configurations:

ConvNeXt-T	$C=(96,192,384,768),B=(3,3,9,3)$
ConvNeXt-S	$C=(96,192,384,768),B=(3,3,27,3)$
ConvNeXt-B	$C=(128,256,512,1024),B=(3,3,27,3)$
ConvNeXt-L	$C=(192,384,768,1536),B=(3,3,27,3)$
ConvNeXt-XL	$C=(256,512,1024,2048),B=(3,3,27,3)$

2.4.1. Results

①Performance table:

2.4.2. Isotropic ConvNeXt vs. ViT

①Remove downsampling structure:

2.5. Empirical Evaluation on Downstream Tasks

①Performance on COCO:

②Performance on ADE20K:

2.6. Related Work

①Other models are larger

2.7. Conclusions

~

3. 知识补充

3.1. Inductive bias

（1）参考学习：【机器学习】浅谈归纳偏置 (Inductive Bias)-优快云博客

4. Reference

Liu, Z. et al. (2022) A ConvNet for the 2020s, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA.

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。