rpn产生proposals_一文读懂RPN和ROI Align

rpn和roi align是two-stage detector中比较关键的两个操作，这两个操作将two-stage detector中的两个stage连接起来，变成end-to-end(端到端)的网络，同时也给整个检测方法的性能带来提升。rpn为roi align提供高质量的候选框，即proposal，关系如图1所示：

下面详细说明一下各个过程，并且配合了代码的说明，其中代码来自facebook的detectron2，其中相关参数的配置文件可以参考detectron2/config/defaults.py

一、RPN

rpn全称是region proposal network，作用是为第二阶段提供高质量的目标候选框，如图1所示，包括了anchor generator、anchor target generator、rpn loss、proposal generator几个关键的步骤，下面分别详细说明。

1，anchor generator

Anchor是根据scale和ratio预先设定的，这些参数在全特征图中共享，如下图所示。左边表示使用stride=16的feature map作为目标检测的特征图，对于输入大小为800*600的图像，该特征图大小为50*38。其中anchor的scale包括为(8，16，32)，ratio参数包括(0.5，1，2)，故产生9种anchor；右图表示对于50*38大小的特征图，共产生17100个anchor，平铺到原图时，有一部分框会超出图像边界。

步骤及代码实现：

detectron2/modeling/anchor_generator.py：

grid_anchors(self, grid_sizes)

1)根据scale和ratio产生，预先生成n种anchor，这n种anchor所在坐标系为以anchor中心为原点的图像坐标系。

generate_cell_anchors(self, sizes=(32, 64, 128, 256, 512), aspect_ratios=(0.5, 1, 2))

2)在特征图中的每个坐标点处，计算anchor中心与该坐标的偏移

_create_grid_offsets(size, stride, offset, device)

3)通过anchor与生成的相对偏移，计算每张特征图中的所有anchor

anchors.append((shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4))

2，anchor target generator

有了anchor，这些anchor是均匀地平铺在特征图的每个像素处，但是我们不知道哪些anchor是包括真实目标的，因此anchor target layer就完成区分哪些anchor是为正样本(包括真实目标)，哪些anchor为负样本(只包括背景)的任务，具体方法是计算anchor与ground truth的IoU，评判标准有3条：对于每一个ground truth，选取一个与之有最大IoU的anchor作为正样

对于每一个anchor，与ground truth的IoU大于某一个阈值t1的anchor，作为正样本。

并不是除了以上两条的anchor为负样本，而是与ground truth的IoU小于某一个阈值t2的anchor为负样本，[t2,t1]之间的样本为“don’t care”样本，既不是正样本也不是负样本，不参与模型优化，即：不计算rpn loss

第一条保证了每个ground truth都有一个anchor与之相对应，第二条保证了在众多的anchor中，可以筛选出一定数据量的anchor作为正样本，保证正负样本的平衡。

步骤及代码实现：

detectron2/modeling/proposal_generator/rpn_outputs.py

detectron2/modeling/matcher.py

_get_ground_truth(self)

1)生成anchor与ground truth的IoU矩阵

match_quality_matrix=retry_if_cuda_oom(pairwise_iou)(gt_boxes_i,anchors_i)

2)根据评判标准，生成2组向量：anchor与ground truth匹配ID，用于bbox(bounding box回归)；anchor的label：正样本、负样本、don’t care。

matched_idxs, gt_objectness_logits_i =retry_if_cuda_oom(self.anchor_matcher)(match_quality_matrix )

3)是否删除超过图像大小的anchor

if self.boundary_threshold>= 0:

# Discard anchors that go outof the boundaries of the image

# NOTE: This is legacyfunctionality that is turned off by default in Detectron2

anchors_inside_image =anchors_i.inside_box(image_size_i, self.boundary_threshold)

gt_objectness_logits_i[~anchors_inside_image] = -1

3，RPN Loss

rpn有两个任务：从众多anchor中，判断哪些anchor是正样本，哪些是负样本，即分类任务；对于正样本的anchor，回归获得真正的目标，即回归任务。所以loss由两部分组成，即：

其中分类任务，使用交叉熵loss:

回归任务使用smooth_l1 loss：

代码实现：

detectron2/modeling/proposal_generator/rpn_outputs.py

detectron2/modeling/box_regression.py

losses(self)

1)分类loss

objectness_loss =F.binary_cross_entropy_with_logits(

pred_objectness_logits[valid_masks],

gt_objectness_logits[valid_masks].to(torch.float32),

reduction="sum",

)

2)回归loss

localization_loss =smooth_l1_loss(

pred_anchor_deltas[pos_masks],gt_anchor_deltas[pos_masks], smooth_l1_beta, reduction="sum"

)

4，proposal generator

获得候选框的目的是为了给第二阶段提供优质的roi框，首先通过rpn_cls_prob筛选出topk_rpn_pre_nms个框，然后再经过nms得到topk_rpn_post_nms个框，最终输出给roi align。主要流程如下图。

主要步骤和相应代码实现如下：

detectron2/modeling/proposal_generator/rpn_outputs.py

find_top_rpn_proposals(

proposals,

pred_objectness_logits,

images,

nms_thresh,

pre_nms_topk,

post_nms_topk,

min_box_side_len,

training,

)

1)获取top_k_pre_nms个候选框

logits_i, idx =logits_i.sort(descending=True, dim=1)

topk_scores_i = logits_i[batch_idx,:num_proposals_i]

topk_idx = idx[batch_idx,:num_proposals_i]

# each is N x topk

topk_proposals_i =proposals_i[batch_idx[:, None], topk_idx] # N x topk x 4

2)对候选框做一些后处理，如：截断超出图像范围的框；删除非常小的框。

boxes.clip(image_size)

# filter empty boxes

keep =boxes.nonempty(threshold=min_box_side_len)

3)nms筛选得到更可信的top_k_post_nms个候选框作为roi

keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)

keep = keep[:post_nms_topk]

二、ROI Align

这个阶段是在rpn提供的proposal的基础上，筛选出第二阶段的训练样本，并提取相应的特征，用于组建第二阶段的训练网络，主要包括两个部分：proposal target generator、feature crop and pooling

1，proposal target generator

这个操作的主要目的是: 在rpn产生的proposal的基础上，选择一定量(min_batch: 一般每张图选择256个proposal，或者512个proposal)的roi，作为训练第二阶段的样本，并且要设定该min_batch中正负样本的比例，如正：负=1：3。

主要步骤与代码实现如下:

detectron2/modeling/roi_heads/roi_heads.py

detectron2/modeling/sampling.py

label_and_sample_proposals(

self, proposals: List[Instances], targets: List[Instances]

) -> List[Instances]:

1)判断proposal中，哪些是正样本，哪些是负样本

match_quality_matrix =pairwise_iou(

targets_per_image.gt_boxes,proposals_per_image.proposal_boxes

)

matched_idxs, matched_labels =self.proposal_matcher(match_quality_matrix)

2)筛选min_batch的样本，并给正样本赋予正确的目标类别

sampled_idxs, gt_classes =self._sample_proposals(

matched_idxs, matched_labels,targets_per_image.gt_classes

)

# Set target attributes of thesampled proposals:

proposals_per_image =proposals_per_image[sampled_idxs]

proposals_per_image.gt_classes =gt_classes

gt_boxes = Boxes(

targets_per_image.gt_boxes.tensor.new_zeros((len(sampled_idxs), 4))

)

proposals_per_image.gt_boxes =gt_boxes

2，feature crop and pooling

得到roi之后，根据roi大小，需要选择合适的特征层crop并pooling得到固定大小的feature map，这个过程称为roi align。原始的roi pooling也可以完成这个操作，但是由于计算过程使用取整操作，造成特征粗糙，对小目标检测效果不好，取整操作主要体现在以下两步：将roi边界量化为整数点坐标值，然后再选择合适feature map进行crop。

将crop得到的区域平均分割成 k x k 个单元(bin)，在获取每个单元的左右边界时取整

Roi align的步骤及相应代码如下，其中所有操作均为浮点数操作：

detectron2/modeling/roi_heads/roi_heads.py

detectron2/modeling/poolers.py

detectron2/layers/roi_align.py

1)计算rpn得到的roi([x1, y1, x2, y2])在相应的特征层box，即：[x1, y1, x2, y2]/stride。stride为特征图相对输入图像缩小的倍数。

pooler_fmt_boxes= convert_boxes_to_pooler_format(box_lists)

2)特征crop并pooling得到目标大小输出的特征图。

output[inds] =pooler(x_level, pooler_fmt_boxes_level)

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/459943.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！