常见的数据增强项目和论文介绍

点击上方“算法猿的成长“，关注公众号，选择加“星标“或“置顶”

总第 121 篇文章，本文大约 1100 字，阅读大约需要 3 分钟

在机器学习项目中，数据集对算法模型的性能是有很大的影响的，特别是现在深度学习，对于数据集的要求就更高了，经常我们都可能会遇到数据集数量太少的情况，这时候除了再人工搜集新的数据，另外一个做法就是数据增强，从已有的数据通过一些方法，包括一些算法来进行拓展，得到更多数量的数据集。

今天介绍的是一个介绍了几个常见的数据增强项目和其他相关的论文代码的 Github 项目，其 github 地址：

https://github.com/CrazyVertigo/awesome-data-augmentation

常见的项目

imgaug

这个项目的 star 数量是已经有 8k 多了，显示使用的数量也有 2k多，克隆仓库的有1k多，的GitHub地址：

https://github.com/aleju/imgaug

它可以实现的效果包括添加噪音、仿射变换、裁剪、翻转、旋转等，其效果图如下所示：

Albumentations

这第二个项目是 2018年的一篇论文《Albumentations: fast and flexible image augmentations》的实现代码，论文地址：

https://arxiv.org/abs/1809.06839v1

github 项目已经有 4k 的star，地址如下：

https://github.com/albumentations-team/albumentations

该项目的特点有：

速度比大部分的库都要快；
基于 numpy 和 OpenCV 两个库，并选择最合适的一个
接口简单，灵活
大量的多种的变换方法实现
易于拓展应用到其他的任务或者库
支持的变换操作对象有图像、masks、关键点、边界框
支持 python 3.5-3.7 的版本
可以和 PyTorch 结合使用
已经被应用到很多深度学习的竞赛中，包括 Kaggle、topcoder，CVPR，MICCAI
作者是 Kaggle Masters

其效果如下所示，可以看到能实现的方法包括颜色空间的变换、亮度调整、模糊、压缩、黑白

Augmentor

第三个项目同样来自一篇论文《Biomedical image augmentation using Augmentor》，其论文地址：

https://www.ncbi.nlm.nih.gov/pubmed/30989173

github star 的数量也有 3.8k了，其地址：

https://github.com/mdbloice/Augmentor

官方文档：

http://augmentor.readthedocs.io/

实现的效果如下所示：

论文和代码

Mixup

来自 ICLR 2018 的论文：《Mixup: BEYOND EMPIRICAL RISK MINIMIZATION》

论文地址：https://arxiv.org/abs/1710.09412

GitHub 地址：https://github.com/facebookresearch/mixup-cifar10

效果如下所示：

Cutout

2017年的论文：《Improved Regularization of Convolutional Neural Networks with Cutout》

论文地址：https://arxiv.org/abs/1708.04552

github 地址：https://github.com/uoguelph-mlrg/Cutout

Cutmix

ICCV 2019 的论文：《CutMix:Regularization Strategy to Train Strong Classiﬁers with Localizable Features》

论文地址：https://arxiv.org/pdf/1905.04899.pdf

github地址: https://github.com/clovaai/CutMix-PyTorch

Augmix

ICLR 2020 的论文：《AUGMIX: A SIMPLE DATA PROCESSING METHOD TO IMPROVE ROBUSTNESS AND UNCERTAINTY》

论文地址：https://arxiv.org/pdf/1912.02781.pdf

github 地址：https://github.com/google-research/augmix

fast-autoaugment

NeurlIPS 2019 的论文《 Fast AutoAugment》

论文地址: https://arxiv.org/abs/1905.00397 github 地址: https://github.com/kakaobrain/fast-autoaugment

AutoAugment

CVPR 2019 的论文《AutoAugment:Learning Augmentation Strategies from Data》

论文地址: https://arxiv.org/pdf/1805.09501v3.pdf

github地址: https://github.com/DeepVoltaire/AutoAugment

RandAugment

ICLR 2020 的论文《RandAugment: Practical automated data augmentation with a reduced search space》

论文地址：https://arxiv.org/pdf/1912.02781.pdf github地址: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet

GridMask

2020年的论文题目《GridMask Data Augmentation》

论文地址：https://arxiv.org/abs/2001.04086 github地址: https://github.com/akuxcw/GridMask 知乎的论文解读: https://zhuanlan.zhihu.com/p/103992528

imagecorruptions

2019 年的论文《Benchmarking Robustness in Object Detection:Autonomous Driving when Winter is Coming》

论文地址：https://arxiv.org/pdf/1912.02781.pdf

github 地址：https://github.com/CrazyVertigo/imagecorruptions

CycleGAN

ICCV 2017 年的一篇论文《Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss》，也是 GAN 领域非常有名的一篇论文

论文地址：https://arxiv.org/pdf/1703.10593.pdf

github 地址：

https://github.com/junyanz/CycleGAN
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

Small Object Augmentation

2019 年的论文《Augmentation for small object detection》

论文地址：https://arxiv.org/pdf/1902.07296.pdf

github 地址：https://github.com/gmayday1997/SmallObjectAugmentation

知乎阅读笔记：https://zhuanlan.zhihu.com/p/64635490

标注工具

labelImg

2017 年的一个标注工具，目前有超过 9k star 的github项目，地址为：

https://github.com/tzutalin/labelImg

它可以进行图片的标注，以及绘制边界框，如下所示：

labelme

同样是2017年开源的一个标注工具，目前有 4k+ 的star，github 地址：

https://github.com/wkentaro/labelme

这是一个可以实现多种形状的标注，比如多边形、圆形、矩形、直线、点等，如下所示：

这个介绍数据增强方面的项目和论文代码，以及标注工具的 GitHub 项目就介绍到这里，再次放上其github 地址：

https://github.com/CrazyVertigo/awesome-data-augmentation

可以点击下方“阅读原文”直接跳转。

精选AI文章

1. 机器学习入门学习资料推荐

2.初学者的机器学习入门实战教程！

3.常用机器学习算法汇总比较(完）

4.特征工程之数据预处理（上）

5.实战|手把手教你训练一个基于Keras的多标签图像分类器

精选python文章

1. Python 基础入门--简介和环境配置

2. python版代码整洁之道

3. 快速入门 Jupyter notebook

4. Jupyter 进阶教程

5. 10个高效的pandas技巧

精选教程资源文章

1. [资源分享] TensorFlow 官方中文版教程来了

2. [资源]推荐一些Python书籍和教程，入门和进阶的都有！

3. [Github项目推荐] 推荐三个助你更好利用Github的工具

4. Github上的各大高校资料以及国外公开课视频

5. GitHub上有哪些比较好的计算机视觉/机器视觉的项目？

欢迎关注我的微信公众号--算法猿的成长，或者扫描下方的二维码，大家一起交流，学习和进步！

如果觉得不错，在看、转发就是对小编的一个支持！