[CVPR 2022]A ConvNet for the 2020s

论文网址:A ConvNet for the 2020s | IEEE Conference Publication | IEEE Xplore

论文代码:GitHub - facebookresearch/ConvNeXt: Code release for ConvNeXt model

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Modernizing a ConvNet: A Roadmap

2.3.1. Training Techniques

2.3.2. Macro Design

2.3.3. ResNeXt-Ify

2.3.4. Inverted Bottleneck

2.3.5. Large Kernel Sizes

2.3.6. Micro Design

2.4. Empirical Evaluations on ImageNet

2.4.1. Results

2.4.2. Isotropic ConvNeXt vs. ViT

2.5. Empirical Evaluation on Downstream Tasks

2.6. Related Work

2.7. Conclusions

3. 知识补充

3.1. Inductive bias

4. Reference


1. 心得

(1)这论文标题顶上空那么多干啥,给我都拿来写了

(2)凌晨四点新开一篇论文!如果世界上没有ddl该多好,可以无忧无虑地看论文

(3)这东西居然这么新!!大为震惊,CNN在22年还能创新的吗。看了眼作者机构基本都是facebook的,打扰了。为什么我总是在打扰。可能我实在太卑微了

(4)该说不说这种文章英文读起来会舒服很多,感觉更有文学性和可读性,推荐阅读英文,会有不同的感受。不过我感觉很多也因为这些作者普遍会自信一些(因为就是很厉害,不会很胆怯),因此在学术论文的表达上也很大胆,不会非常枯燥

(5)不是,怎么看到后面感觉像个调参的

2. 论文逐段精读

2.1. Abstract

        ①They aim to explore the possibility of pure ConvNet

2.2. Introduction

        ①The biggest challenge for ViT is the quadratic complexity with respect to the input size

        ②They aim to identify how do design decisions in Transformers impact ConvNets' performance?

precipitate  vt.加速(坏事的发生);使突然陷入(某种状态);使…突然降临  adj.仓促的;鲁莽的;草率的  n.沉淀物;析出物

odyssey  n.漫长而充满风险的历程;艰苦的跋涉

2.3. Modernizing a ConvNet: A Roadmap

        ①They apply ResNet-50 / Swin-T with 4.5e9 FLOPs to present results

        ②Performance of ConvNeXt on ImageNet under different design:

        ③Comparison diagram on ImageNet:

2.3.1. Training Techniques

        ①Training techniques such as optimizer changing will actually enhance the performance of CNN

2.3.2. Macro Design

        ①Adjusting the block numbers each stage of ResNet-50 from (3, 4, 6, 3) to (3, 3, 9, 3) for aligning FLOPs with Swin-T

        ②They changed kernels in ResNet from 7 * 7 with stride 2 to 4 * 4 with stride 4 

2.3.3. ResNeXt-Ify

        ①Adding channels from 64 to 96 (same as Swin-T), enhancing accuracy and increasing 5.3 GFLOPs

2.3.4. Inverted Bottleneck

        ①Bottleneck design:

where (a) is ResNeXt block, (b) is their inverted bottleneck,(c) is inverted bottleneck with block position changing

        ②Inverted bottleneck design decreases to 4.6 GFLOPs

2.3.5. Large Kernel Sizes

        ①Increase kernel size to 7*7

        ②From (b) to (c), the GFLOPs decrease to 4.1 GFLPs

2.3.6. Micro Design

        ①Replace ReLU by GELU

        ②Remove 2 GELU to get less activation function:

        ③Reduce BatchNorm (BN) layers and replace one by Layer Normalization (LN)

        ④Spatial downsampling by residual 2*2 block 

        ⑤The FLOPs, #params., throughput, and memory use of Swin Transformer and ConvNeXt are similar, but ConvNeXt does not need shifted window attention or relative position biases

2.4. Empirical Evaluations on ImageNet

        ①All the configurations:

ConvNeXt-TC=(96,192,384,768),B=(3,3,9,3)
ConvNeXt-SC=(96,192,384,768),B=(3,3,27,3)
ConvNeXt-BC=(128,256,512,1024),B=(3,3,27,3)
ConvNeXt-LC=(192,384,768,1536),B=(3,3,27,3)
ConvNeXt-XLC=(256,512,1024,2048),B=(3,3,27,3)

2.4.1. Results

        ①Performance table:

2.4.2. Isotropic ConvNeXt vs. ViT

        ①Remove downsampling structure:

2.5. Empirical Evaluation on Downstream Tasks

        ①Performance on COCO:

        ②Performance on ADE20K:

2.6. Related Work

        ①Other models are larger

2.7. Conclusions

        ~

3. 知识补充

3.1. Inductive bias

(1)参考学习:【机器学习】浅谈 归纳偏置 (Inductive Bias)-优快云博客

4. Reference

Liu, Z. et al. (2022) A ConvNet for the 2020s, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值