【EfficientNetV2】《EfficientNetV2: Smaller Models and Faster Training》

在这里插入图片描述
google

ICML-2021

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
- 4. 1 Understanding Training Efficiency
- 4.2 Training-Aware NAS and Scaling
- 4.3 Progressive learning
5 Experiments
- 5.1 Datasets and Metrics
- 5.2 ImageNet ILSVRC2012 and ImageNet21k
- 5.3 Transfer Learning Datasets
- 5.4 Ablation Studies
- - Comparison to EfficientNet
- Progressive Learning for Different Networks
- Importance of Adaptive Regularization
6 Conclusion（own）

1 Background and Motivation

efficientNet v1（【EfficientNet】《EfficientNet：Rethinking Model Scaling for Convolutional Neural Networks》）的基础上升级， faster training speed and better parameter efficiency

作者观察到

training with very large image sizes is slow（Progressive training，input image size 随着训练深入而逐渐变大）
depthwise convolutions are slow in early layers（提出Fused-MBConv，NAS 出来 early layer 都用的是没有 dw conv 的 Fused-MBConv）
equally scaling up every stage is sub-optimal

2 Related Work

Training and parameter efficiency
Progressive training
Neural architecture search (NAS)

3 Advantages / Contributions

提出 EfficentNet v2，用 training-aware neural architecture search and scaling 搜索出来，采用 progressive learning（image size 和 regularization）进一步提升速度精度，公开数据集上更快更好

在这里插入图片描述

4 Method

EfficientNet
在这里插入图片描述

4. 1 Understanding Training Efficiency

（1）Training with very large image sizes is slow

在这里插入图片描述

（2）Depthwise convolutions are slow in early layers but effective in later stages

在这里插入图片描述
DW conv 虽然参数量和计算量更小，但是 cannot fully utilize modern accelerators

在这里插入图片描述

Fuse-MBConv 高效，精度高，但是参数量和 FLOPS 会变高

MB 和 Fuse-MB 两者在网络中如何搭配更高效呢? 作者： leverage neural architecture search to automatically search for the best combination.

（3）Equally scaling up every stage is sub-optimal

网络深度或者宽度缩放时所有 stage 一样，不是最优的

EfficientNet-S -> EfficientNet-M -> EfficientNet-L

4.2 Training-Aware NAS and Scaling

在 v1 的基础上做的 NAS，搜出来的结构如下
在这里插入图片描述

EfficientNetV2-S，scale up 到 M 和 L 时，gradually add more layers to later stages (e.g., stage 5 and 6）

Training Speed Comparison:
在这里插入图片描述
effNet(reprod) 用了 about 30% smaller image size

4.3 Progressive learning

in the early training epochs, we train the network with small image size and weak regularization (e.g., dropout and data augmentation), then we gradually increase image size and add stronger regularization.

创新点在于不仅 progressive image size，还有 regularization

在这里插入图片描述
输入尺寸越大，数据增强的程度相应的增高，效果会更好

progress learning 的策略如下

在这里插入图片描述
不仅 size 随着训练过程的深入在增大，regularization 强度也在递增，算法流程如下

在这里插入图片描述

$S_i$ image size，最初 $S_0$ ，最终 $S_e$
$\phi_i^k$ 正则化强度，regularization magnitude，参考（【Randaugment】《Randaugment：Practical automated data augmentation with a reduced search space》和【AutoAgument for OD】《Learning Data Augmentation Strategies for Object Detection》），作者用的正则化技术有 Dropout、RandAugment、Mixup，最低强度 $\phi_0^k$ 最高强度 $\phi_e^k$
$M$ ，训练过程被划分成了 $M$ 个 stage，training process into four stages with about 87 epochs per stage，注意区别于主干的 stage
$N$ ，traning steps，可以理解为 epoch 或者每次 batch-size 的 iteration

progress learning 采用了最简单的线性增长形式，更细节的参数配置范围如下表

在这里插入图片描述

5 Experiments

5.1 Datasets and Metrics

ImageNet ILSVRC2012：about 1.28M training images and 50,000 validation images with 1000 classes
ImageNet21k：13M training images with 21,841 classes
CIFAR-10
CIFAR-100
Flowers
Cars

在这里插入图片描述

5.2 ImageNet ILSVRC2012 and ImageNet21k

结果展示
在这里插入图片描述
又快又好

在这里插入图片描述

不过这个图看起来，速度优势并没有很明显，精度倒是优势很明显

用了 ImageNet21k 后作者的实验心得

Scaling up data size is more effective than simply scaling up model size in high-accuracy regime，这里的 data size 指的是数量级的规模，不是输入到网络中的图片 size，
Pretraining on ImageNet21k could be quite efficient，作者的方法可以加速训练和推理过程，这个时候上更大的数据集时间可能和之前不用 progressive learning 差不多，但效果进一步提升

5.3 Transfer Learning Datasets

在这里插入图片描述
CIFAR-10 还好，CIFAR-100 和 cars 领先的比较明显

5.4 Ablation Studies

Comparison to EfficientNet

在这里插入图片描述
training speed (reduced from 139h to 54h) and accuracy (improved from 84.7% to 85.0%) are better than the original paper

EfficientNet-v2-S 的基础上，scaling down 一些小模型，看看性能
在这里插入图片描述
主打的是一个快

Progressive Learning for Different Networks

在这里插入图片描述
提速明显

Importance of Adaptive Regularization

在这里插入图片描述
这个是作者的创新点之一

在这里插入图片描述

6 Conclusion（own）

Unlike transformers, whose weights (e.g., position embedding) may depend on input length,
progressive learning， does it work in object detection?
depth-wise 丢到后面的stage用
data size vs model size

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/743822.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！