【GN】《Group Normalization》

在这里插入图片描述
ECCV-2018
Facebook AI Research
更多论文解读，可参考【Paper Reading】

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
- 5.1 Datasets and Metrics
- 5.2 Image Classification in ImageNet
- 5.3 Object Detection and Segmentation in COCO
- 5.4 Video Classification in Kinetics
6 Conclusion（own） / Future work

1 Background and Motivation

在这里插入图片描述

Batch normalization（BN）在 batch size 很小的时候，效果下降的比较多，而目标检测或者分割等任务由于输入分辨率比较高，网络偏大时 batch-size 往往比较小，BN 发挥的作用减弱了

作者基于 many classical features like SIFT and HOG are group-wise features and involve group-wise normalization

提出了 Group Normalization，以此来减小小 batch-size 对 normalization 带来的影响

2 Related Work

在这里插入图片描述

Normalization
LRN / BN / LN / IN / WN（weight normalization）
LN 和 IN 属于 GN 的两个极端， effective for training sequential models (RNN/LSTM) or generative models(GAN)，but have limited success in visual recognition
Addressing small batches
Batch Renormalization（batch size 过小也不行）
Group-wise computation
AlexNet / ResNeXt / MobileNet / Xception / ShuffleNet

3 Advantages / Contributions

提出 Group Normalization

4 Method

its computation is independent of batch sizes.
在这里插入图片描述

LN, IN, and GN all perform independent computations along the batch axis

GN 的两个极端就是 LN 和 IN

看看公式表达，减均值，除以标准差
在这里插入图片描述
打一巴掌来个糖，学两个参数弥补回来

$i = (i_N, i_C,i_H,i_W)$

在这里插入图片描述
$S_i$ is the set of pixels in which the mean and std are computed, and $m$ is the size of this set.

$\epsilon$ 防止除以 0

BN，某通道下 NHW

在这里插入图片描述
LN，某 batch 下，CHW

IN，某通道，某 batch 下，HW

GN，某 batch 下，某组通道

$G$ is the number of groups，默认 32

tensorflow 代码
在这里插入图片描述

5 Experiments

5.1 Datasets and Metrics

ImageNet：top-1 classification error
COCO Detection：mAP
COCO Segmentation：mmAP
Kinetics： accuracy

5.2 Image Classification in ImageNet

（1）Comparison of feature normalization methods

在这里插入图片描述
bs = 32 的时候，train error GN 最低，但是 val error 没有 BN 好，说明泛化性能没有 BN 好

作者的解释

BN’s mean and variance computation introduces uncertainty caused by the stochastic batch sampling, which helps regularization

32 组不知道每组通道数为多少，如果 32 的话， normalization 的数量和 bs = 32 的 BN 是一样的了，区别一个为 batch 轴的 32，一个为 channel 轴的 32

在这里插入图片描述
bs = 32 的时候，没有BN 好

（2）Small batch sizes

在这里插入图片描述

bs 比较小的时候，GN 的优势发挥出来了，且 GN 对 bs 不敏感

优势，This will make it possible to train higher capacity models that would be otherwise bottlenecked by memory limitation

（3）Comparison with Batch Renorm (BR)

With a batch size of 4, ResNet-50 trained with BR has an error rate of 26.3%.

BN 27.3%

GN 24.2%

（4）Group division

在这里插入图片描述
对比了下 G 和 channel per group 的不同配置结果

（6）Deeper models

resnet101，32 bs 不如 BN，2 bs 比 BN 好

（7）Results and analysis of VGG models
在这里插入图片描述
conv5_3（the last convolutional layer）

normalization 还是比较重要的，GN 比 BN 效果更好

5.3 Object Detection and Segmentation in COCO

BS 比较小的任务上，属于 GN 的领域

（1）Results of C4 backbone
在这里插入图片描述
主干C4 特征图接分类回归分割头

（2）Results of FPN backbone
在这里插入图片描述
FPN 接分类回归分割头

long：iterations from 180k to 270k

（3）Training Mask R-CNN from scratch

在这里插入图片描述
对比 table6 的结果看，从头开始训练也是比 BN fine-tune 强的

5.4 Video Classification in Kinetics

在这里插入图片描述

6 Conclusion（own） / Future work

BN 的缺点 BN’s error increases rapidly when the batch size becomes smaller，原因 reducing the batch size can have dramatic impact on the estimated batch statistics
GN could be used in place of LN and IN and thus is applicable for sequential or generative models
BS 比较大的时候没有 BN 猛，BS 比较小的时候比 BN 猛

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/806655.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！