ECCV-2018
Facebook AI Research
更多论文解读,可参考【Paper Reading】
文章目录
- 1 Background and Motivation
- 2 Related Work
- 3 Advantages / Contributions
- 4 Method
- 5 Experiments
- 5.1 Datasets and Metrics
- 5.2 Image Classification in ImageNet
- 5.3 Object Detection and Segmentation in COCO
- 5.4 Video Classification in Kinetics
- 6 Conclusion(own) / Future work
1 Background and Motivation
Batch normalization(BN) 在 batch size 很小的时候,效果下降的比较多,而目标检测或者分割等任务由于输入分辨率比较高,网络偏大时 batch-size 往往比较小,BN 发挥的作用减弱了
作者基于 many classical features like SIFT and HOG are group-wise features and involve group-wise normalization
提出了 Group Normalization,以此来减小小 batch-size 对 normalization 带来的影响
2 Related Work
- Normalization
LRN / BN / LN / IN / WN(weight normalization)
LN 和 IN 属于 GN 的两个极端, effective for training sequential models (RNN/LSTM) or generative models(GAN),but have limited success in visual recognition - Addressing small batches
Batch Renormalization(batch size 过小也不行) - Group-wise computation
AlexNet / ResNeXt / MobileNet / Xception / ShuffleNet
3 Advantages / Contributions
提出 Group Normalization
4 Method
its computation is independent of batch sizes.
LN, IN, and GN all perform independent computations along the batch axis
GN 的两个极端就是 LN 和 IN
看看公式表达,减均值,除以标准差
打一巴掌来个糖,学两个参数弥补回来
i = ( i N , i C , i H , i W ) i = (i_N, i_C,i_H,i_W) i=(iN,iC,iH,iW)
S i S_i Si is the set of pixels in which the mean and std are computed, and m m m is the size of this set.
ϵ \epsilon ϵ 防止除以 0
BN,某通道下 NHW
LN,某 batch 下,CHW
IN,某通道,某 batch 下,HW
GN,某 batch 下,某组通道
G G G is the number of groups,默认 32
tensorflow 代码
5 Experiments
5.1 Datasets and Metrics
ImageNet:top-1 classification error
COCO Detection:mAP
COCO Segmentation:mmAP
Kinetics: accuracy
5.2 Image Classification in ImageNet
(1)Comparison of feature normalization methods
bs = 32 的时候,train error GN 最低,但是 val error 没有 BN 好,说明泛化性能没有 BN 好
作者的解释
BN’s mean and variance computation introduces uncertainty caused by the stochastic batch sampling, which helps regularization
32 组不知道每组通道数为多少,如果 32 的话, normalization 的数量和 bs = 32 的 BN 是一样的了,区别一个为 batch 轴的 32,一个为 channel 轴 的 32
bs = 32 的时候,没有BN 好
(2)Small batch sizes
bs 比较小的时候,GN 的优势发挥出来了,且 GN 对 bs 不敏感
优势,This will make it possible to train higher capacity models that would be otherwise bottlenecked by memory limitation
(3)Comparison with Batch Renorm (BR)
With a batch size of 4, ResNet-50 trained with BR has an error rate of 26.3%.
BN 27.3%
GN 24.2%
(4)Group division
对比了下 G 和 channel per group 的不同配置结果
(6)Deeper models
resnet101,32 bs 不如 BN,2 bs 比 BN 好
(7)Results and analysis of VGG models
conv5_3(the last convolutional layer)
normalization 还是比较重要的,GN 比 BN 效果更好
5.3 Object Detection and Segmentation in COCO
BS 比较小的任务上,属于 GN 的领域
(1)Results of C4 backbone
主干C4 特征图接分类回归分割头
(2)Results of FPN backbone
FPN 接分类回归分割头
long:iterations from 180k to 270k
(3)Training Mask R-CNN from scratch
对比 table6 的结果看,从头开始训练也是比 BN fine-tune 强的
5.4 Video Classification in Kinetics
6 Conclusion(own) / Future work
- BN 的缺点 BN’s error increases rapidly when the batch size becomes smaller,原因 reducing the batch size can have dramatic impact on the estimated batch statistics
- GN could be used in place of LN and IN and thus is applicable for sequential or generative models
- BS 比较大的时候没有 BN 猛,BS 比较小的时候比 BN 猛