Bounding boxes augmentation for object detection

Different annotations formats¶

Bounding boxes are rectangles that mark objects on an image. There are multiple formats of bounding boxes annotations. Each format uses its specific representation of bouning boxes coordinates 每种格式都使用其特定的边界框坐标表示。. Albumentations supports four formats: pascal_vocalbumentationscoco, and yolo .

Let's take a look at each of those formats and how they represent coordinates 坐标 of bounding boxes.

As an example, we will use an image from the dataset named Common Objects in Context. It contains one bounding box that marks a cat. The image width is 640 pixels, and its height is 480 pixels. The width of the bounding box is 322 pixels, and its height is 117 pixels.

 An example image with a bounding box from the COCO dataset

pascal_voc¶

pascal_voc is a format used by the Pascal VOC dataset. Coordinates of a bounding box are encoded with four values in pixels: [x_min, y_min, x_max, y_max]. x_min and y_min are coordinates of the top-left corner of the bounding box. x_max and y_max are coordinates of bottom-right corner of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 420, 462].

albumentations¶

albumentations is similar to pascal_voc, because it also uses four values [x_min, y_min, x_max, y_max] to represent a bounding box. But unlike pascal_vocalbumentations uses normalized values. To normalize values, we divide coordinates in pixels for the x- and y-axis by the width and the height of the image.

Coordinates of the example bounding box in this format are [98 / 640, 345 / 480, 420 / 640, 462 / 480] which are [0.153125, 0.71875, 0.65625, 0.9625].

Albumentations uses this format internally 内部 to work with bounding boxes and augment them.

coco¶

coco is a format used by the Common Objects in Context COCOCOCO dataset.

In coco, a bounding box is defined by four values in pixels [x_min, y_min, width, height]. They are coordinates of the top-left corner along with the width and height of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 322, 117].

yolo¶

In yolo, a bounding box is represented by four values [x_center, y_center, width, height]x_center and y_center are the normalized coordinates of the center of the bounding box. To make coordinates normalized, we take pixel values of x and y, which marks the center of the bounding box on the x- and y-axis. Then we divide the value of x by the width of the image and value of y by the height of the image. width and height represent the width and the height of the bounding box. They are normalized as well.

Coordinates of the example bounding box in this format are [((420 + 98) / 2) / 640, ((462 + 345) / 2) / 480, 322 / 640, 117 / 480] which are [0.4046875, 0.840625, 0.503125, 0.24375].

Bounding boxes augmentation¶

Just like with images and masks augmentation, the process of augmenting bounding boxes consists of 4 steps.

  1. You import the required libraries.
  2. You define an augmentation pipeline.
  3. You read images and bounding boxes from the disk.
  4. You pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.

Note

Some transforms in Albumentation don't support bounding boxes. If you try to use them you will get an exception. Please refer to this article to check whether a transform can augment bounding boxes.

Step 1. Import the required libraries.¶

import albumentations as A
import cv2

Step 2. Define an augmentation pipeline.¶

Here an example of a minimal declaration of an augmentation pipeline that works with bounding boxes.

transform = A.Compose([A.RandomCrop(width=450, height=450),A.HorizontalFlip(p=0.5),A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco'))

Note that unlike image and masks augmentation, Compose now has an additional parameter bbox_params. You need to pass an instance of A.BboxParams to that argument. A.BboxParams specifies settings for working with bounding boxes. format sets the format for bounding boxes coordinates.

It can either be pascal_vocalbumentationscoco or yolo. This value is required because Albumentation needs to know the coordinates' source format for bounding boxes to apply augmentations correctly.

Besides formatA.BboxParams supports a few more settings.

Here is an example of Compose that shows all available settings with A.BboxParams:

transform = A.Compose([A.RandomCrop(width=450, height=450),A.HorizontalFlip(p=0.5),A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', min_area=1024, min_visibility=0.1, label_fields=['class_labels']))

min_area and min_visibility

min_area and min_visibility parameters control what Albumentations should do to the augmented bounding boxes if their size has changed after augmentation. The size of bounding boxes could change if you apply spatial augmentations 空间增强 , for example, when you crop 裁剪 a part of an image or when you resize an image.

min_area is a value in pixels 是以像素为单位的值. If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won't contain that bounding box.

min_visibility is a value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won't be present in the returned list of the augmented bounding boxes.

Here is an example image that contains two bounding boxes. Bounding boxes coordinates are declared using the coco format.

 An example image with two bounding boxes

First, we apply the CenterCrop augmentation without declaring parameters min_area and min_visibility. The augmented image contains two bounding boxes.

 An example image with two bounding boxes after applying augmentation

Next, we apply the same CenterCrop augmentation, but now we also use the min_area parameter. Now, the augmented image contains only one bounding box, because the other bounding box's area after augmentation became smaller than min_area, so Albumentations dropped that bounding box.

 An example image with one bounding box after applying augmentation with 'min_area'

Finally, we apply the CenterCrop augmentation with the min_visibility. After that augmentation, the resulting image doesn't contain any bounding box, because visibility of all bounding boxes after augmentation are below threshold set by min_visibility.

An example image with zero bounding boxes after applying augmentation with 'min_visibility' 

Class labels for bounding boxes¶

Besides coordinates, each bounding box should have an associated class label that tells which object lies inside the bounding box. There are two ways to pass a label for a bounding box.

Let's say you have an example image with three objects: dogcat, and sports ball. Bounding boxes coordinates in the coco format for those objects are [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49].

An example image with 3 bounding boxes from the COCO dataset

1. You can pass labels along with bounding boxes coordinates by adding them as additional values to the list of coordinates.¶

For the image above, bounding boxes with class labels will become [23, 74, 295, 388, 'dog'][377, 294, 252, 161, 'cat'], and [333, 421, 49, 49, 'sports ball'].

Class labels could be of any type: integer, string, or any other Python data type. For example, integer values as class labels will look the following: [23, 74, 295, 388, 18][377, 294, 252, 161, 17], and [333, 421, 49, 49, 37].

 Also, you can use multiple class values for each bounding box, for example [23, 74, 295, 388, 'dog', 'animal'][377, 294, 252, 161, 'cat', 'animal'], and [333, 421, 49, 49, 'sports ball', 'item'].

2.You can pass labels for bounding boxes as a separate list (the preferred way).¶

 For example, if you have three bounding boxes like [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49] you can create a separate list with values like ['cat', 'dog', 'sports ball'], or [18, 17, 37] that contains class labels for those bounding boxes. Next, you pass that list with class labels as a separate argument to the transform function. Albumentations needs to know the names of all those lists with class labels to join them with augmented bounding boxes correctly. Then, if a bounding box is dropped after augmentation because it is no longer visible, Albumentations will drop the class label for that box as well. Use label_fields parameter to set names for all arguments in transform that will contain label descriptions for bounding boxes (more on that in Step 4).

Step 3. Read images and bounding boxes from the disk.¶

Read an image from the disk.

image = cv2.imread("/path/to/image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Bounding boxes can be stored on the disk in different serialization formats: JSON, XML, YAML, CSV, etc. So the code to read bounding boxes depends on the actual format of data on the disk.

After you read the data from the disk, you need to prepare bounding boxes for Albumentations.

Albumentations expects that bounding boxes will be represented 表示 as a list of lists. Each list contains information about a single bounding box. A bounding box definition should have at list four elements that represent the coordinates of that bounding box. The actual meaning of those four values depends on the format of bounding boxes (either pascal_vocalbumentationscoco, or yolo). Besides four coordinates, each definition of a bounding box may contain one or more extra values. You can use those extra values to store additional information about the bounding box, such as a class label of the object inside the box. During augmentation, Albumentations will not process those extra values. The library will return them as is along with the updated coordinates of the augmented bounding box 库将按原样返回它们以及增强边界框的更新坐标.

Step 4. Pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.¶

As discussed in Step 2, there are two ways of passing class labels along with bounding boxes coordinates:

1. Pass class labels along with coordinates.¶

So, if you have coordinates of three bounding boxes that look like this:

bboxes = [[23, 74, 295, 388],[377, 294, 252, 161],[333, 421, 49, 49],
]

you can add a class label for each bounding box as an additional element of the list along with four coordinates. So now a list with bounding boxes and their coordinates will look the following:

bboxes = [[23, 74, 295, 388, 'dog'],[377, 294, 252, 161, 'cat'],[333, 421, 49, 49, 'sports ball'],
]

or with multiple labels per each bounding box:

bboxes = [[23, 74, 295, 388, 'dog', 'animal'],[377, 294, 252, 161, 'cat', 'animal'],[333, 421, 49, 49, 'sports ball', 'item'],
]

You can use any data type for declaring class labels. It can be string, integer, or any other Python data type.

Next, you pass an image and bounding boxes for it to the transform function and receive the augmented image and bounding boxes.

transformed = transform(image=image, bboxes=bboxes)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']

 Example input and output data for bounding boxes augmentation

2. Pass class labels in a separate argument to transform (the preferred way).¶

 Let's say you have coordinates of three bounding boxes

bboxes = [[23, 74, 295, 388],[377, 294, 252, 161],[333, 421, 49, 49],
]

You can create a separate list that contains class labels for those bounding boxes:

class_labels = ['cat', 'dog', 'parrot']

Then you pass both bounding boxes and class labels to transform. Note that to pass class labels, you need to use the name of the argument that you declared in label_fields when creating an instance of Compose in step 2. In our case, we set the name of the argument to class_labels.

transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']

Example input and output data for bounding boxes augmentation with a separate argument for class labels 

Note that label_fields expects a list, so you can set multiple fields that contain labels for your bounding boxes. So if you declare Compose like

transform = A.Compose([A.RandomCrop(width=450, height=450),A.HorizontalFlip(p=0.5),A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', label_fields=['class_labels', 'class_categories'])))

you can use those multiple arguments to pass info about class labels, like

class_labels = ['cat', 'dog', 'parrot']
class_categories = ['animal', 'animal', 'item']transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels, class_categories=class_categories)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']
transformed_class_categories = transformed['class_categories']

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/205622.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

案例060:基于微信小程序考试系统

文末获取源码 开发语言:Java 框架:SSM JDK版本:JDK1.8 数据库:mysql 5.7 开发软件:eclipse/myeclipse/idea Maven包:Maven3.5.4 小程序框架:uniapp 小程序开发软件:HBuilder X 小程序…

01-SDV软件定义汽车思考

前言: 随着汽车产业“新四化”(电动化、网联化、智能化、共享化)的加速推动,智能汽车已成为各国科技发展战略重点,在社会数字化转型的浪潮下逐渐形成跨领域协作、多技术融合的汽车产业新赛道。 软件定义汽车已成为行业趋势与共识&#xff…

class037 二叉树高频题目-下-不含树型dp【算法】

class037 二叉树高频题目-下-不含树型dp【算法】 code1 236. 二叉树的最近公共祖先 // 普通二叉树上寻找两个节点的最近公共祖先 // 测试链接 : https://leetcode.cn/problems/lowest-common-ancestor-of-a-binary-tree/ package class037;// 普通二叉树上寻找两个节点的最近…

HashMap系列-resize

1.resize public class HashMap<K,V> extends AbstractMap<K,V>implements Map<K,V>, Cloneable, Serializable {final Node<K,V>[] resize() {Node<K,V>[] oldTab table;int oldCap (oldTab null) ? 0 : oldTab.length; //老的数组容量in…

RabbitMQ学习二

RabbitMQ学习二 发送者的可靠性生产者连接重试机制生产者确认机制开启生产者确认定义ReturnCallback定义confirmCallback MQ的可靠性交换机和队列持久化消息持久化LazyQueue控制台配置Lazy模式代码配置Lazy模式 消费者的可靠性失败重试机制失败处理策略业务幂等性唯一消息ID业务…

AI人工智能在电子商务领域的运用

电子商务领域和个性化新时代的 AI 随着整个社会追求便利性&#xff0c;并且逐渐从传统的实体零售模式转向网购模式&#xff0c;在线零售商必须改变与客户的互动方式。为每个客户提供个性化购物体验的理念一直都存在&#xff0c;但是现在我们正式进入了个性化新时代。这是一个包…

Docker网络原理及Cgroup硬件资源占用控制

docker的网络模式 获取容器的进程号 docker inspect -f {{.State.Pid}} 容器id/容器名 docker初始状态下有三种默认的网络模式 &#xff0c;bridg&#xff08;桥接&#xff09;&#xff0c;host&#xff08;主机&#xff09;&#xff0c;none&#xff08;无网络设置&#xff…

【flink番外篇】1、flink的23种常用算子介绍及详细示例(2)- keyby、reduce和Aggregations

Flink 系列文章 1、Flink 专栏等系列综合文章链接 文章目录 Flink 系列文章一、Flink的23种算子说明及示例6、KeyBy7、Reduce8、Aggregations 本文主要介绍Flink 的3种常用的operator&#xff08;keyby、reduce和Aggregations&#xff09;及以具体可运行示例进行说明. 如果需要…

【vtkWidgetRepresentation】第五期 vtkLineRepresentation

很高兴在雪易的CSDN遇见你 内容同步更新在公众号“VTK忠粉” 【vtkWidgetRepresentation】第五期 一条直线的交互 前言 本文分享vtkLineRepresentation&#xff0c;希望对各位小伙伴有所帮助&#xff01; 感谢各位小伙伴的点赞关注&#xff0c;小易会继续努力分享&#xf…

Presto:基于内存的OLAP查询引擎

PrestoSQL查询引擎 1、Presto概述1.1、Presto背景1.2、什么是Presto1.3、Presto的特性2、Presto架构2.1、Presto的两类服务器2.2、Presto基本概念2.3、Presto数据模型3、Presto查询过程3.1、Presto执行原理3.2、Presto与Hive3.3、Presto与Impala3.4、PrestoDB与PrestoSQL4、Pre…

Libavutil详解:理论与实战

文章目录 前言一、Libavutil 简介二、AVLog 测试1、示例源码2、运行结果 三、AVDictionary 测试1、示例源码2、运行结果 四、ParseUtil 测试1、示例源码2、运行结果 前言 libavutil 是一个实用库&#xff0c;用于辅助多媒体编程&#xff0c;本文记录 libavutil 库学习及 demo 例…

智能优化算法应用:基于战争策略算法无线传感器网络(WSN)覆盖优化 - 附代码

智能优化算法应用&#xff1a;基于战争策略算法无线传感器网络(WSN)覆盖优化 - 附代码 文章目录 智能优化算法应用&#xff1a;基于战争策略算法无线传感器网络(WSN)覆盖优化 - 附代码1.无线传感网络节点模型2.覆盖数学模型及分析3.战争策略算法4.实验参数设定5.算法结果6.参考…

对比两阶段提交,三阶段协议有哪些改进?

本文我们来讨论两阶段提交和三阶段提交协议的过程以及应用。 在分布式系统中&#xff0c;各个节点之间在物理上相互独立&#xff0c;通过网络进行沟通和协调。在关系型数据库中&#xff0c;由于存在事务机制&#xff0c;可以保证每个独立节点上的数据操作满足 ACID。但是&…

WMMSE方法的使用笔记

标题很帅 原论文的描述WMMSE的简单应用 无线蜂窝通信系统的预编码设计问题中&#xff0c;经常提到用WMMSE方法设计多用户和速率最大化的预编码&#xff0c;其中最为关键的一步是将原和速率最大化问题转化为均方误差最小化问题&#xff0c;从而将问题由非凸变为关于三个新变量的…

Zabbix“专家坐诊”第214期问答汇总

问题一 Q&#xff1a;Zabbix 6.4版本&#xff0c;如图&#xff0c;95th percentable这个值是否会存到zabbix的数据库里&#xff1f;如果存了是存到了哪里&#xff1f; A&#xff1a;这个值是不会保存到数据库里的&#xff0c;它会根据所选的时间段而变化。 问题二 Q&#xff1…

5分钟搞懂ECN

ECN是通过在IP和TCP头中携带拥塞信息&#xff0c;通知发送方网络拥塞状态&#xff0c;从而采取相应拥塞控制措施。原文: What is ECN(Explicit Congestion Notification)?[1] ECN是Explicit Congestion Notification的缩写&#xff0c;意思是显式拥塞通知算法&#xff0c;和慢…

黑苹果之主板篇

一、什么是主板 主板&#xff0c;又叫主机板&#xff08;mainboard&#xff09;、系统板&#xff08;systemboard&#xff09;、或母板&#xff08;motherboard&#xff09;&#xff0c;是计算机最基本的同时也是最重要的部件之一。主板一般为矩形电路板&#xff0c;上面安装了…

Zabbix自动发现机制

Zabbix的自动发现机制 Zabbix客户端主动的和服务端联系&#xff0c;将自己的地址和端口发送服务端&#xff0c;实现自动添加监控主机&#xff0c;客户端是主动的一方缺点自定义网段中主机数量太多&#xff0c;等级耗时会很久&#xff0c;而且这个自动发现机制不是很稳定 Zabb…

06 硬件知识入门(MOSS管)

1 简介 MOS管和三极管的驱动方式完全不一样&#xff0c;以NPN型三极管为例&#xff0c;base极以小电流打开三极管&#xff0c;此时三极管的集电极被打开&#xff0c;发射极的高电压会导入&#xff0c;此时电流&#xff1a;Ic IbIe &#xff1b;电压&#xff1a;Ue>Uc>Ub…

看好美国跨境电商平台Etsy的三个理由

来源&#xff1a;猛兽财经 作者&#xff1a;猛兽财经 不可否认&#xff0c;最近的经济低迷给美国跨境电商平台Etsy(ETSY)的增长带来了一些麻烦。虽然Etsy第三季度营收同比增长了7%&#xff0c;但其商品总量仅增长了1%。如果没有有利的汇率&#xff0c;Etsy的销售额基本上会与前…