华为开源自研AI框架昇思MindSpore应用案例：FCN图像语义分割

Mask R-CNN
MaskRCNN是一种概念简单、灵活、通用的目标实例分割框架，在检测出图像中目标的同时，还为每一个实例生成高质量掩码。这种称为Mask R-CNN的方法，通过添加与现有边框检测分支平行的预测目标掩码分支，达到扩展Faster R-CNN的目的。Mask R-CNN训练简单，运行速度达5fps，与Faster R-CNN相比，开销只有小幅上涨。此外，Mask R-CNN易于推广到其他任务。例如，允许在同一框架中预测人体姿势。 Mask R-CNN在COCO挑战赛的三个关键难点上都表现不俗，包括实例分割、边框目标检测和人物关键点检测。Mask R-CNN没有什么华而不实的附加功能，各任务的表现都优于现存所有单模型，包括COCO 2016挑战赛的胜出模型。

模型简介
MaskRCNN是一个两级目标检测网络，作为FasterRCNN的扩展模型，在现有的边框检测分支的基础上增加了一个预测目标掩码的分支。该网络采用区域候选网络（RPN），可与检测网络共享整个图像的卷积特征，无需任何代价就可轻松计算候选区域。整个网络通过共享卷积特征，将RPN和掩码分支合并为一个网络。其模型骨干还可以选择轻量级网络Mobilenet。

如果你对MindSpore感兴趣，可以关注昇思MindSpore社区

在这里插入图片描述

一、环境准备

1.进入ModelArts官网

云平台帮助用户快速创建和部署模型，管理全周期AI工作流，选择下面的云平台以开始使用昇思MindSpore，获取安装命令，安装MindSpore2.0.0-alpha版本，可以在昇思教程中进入ModelArts官网

在这里插入图片描述

选择下方CodeLab立即体验

在这里插入图片描述

等待环境搭建完成

在这里插入图片描述

2.使用CodeLab体验Notebook实例

下载NoteBook样例代码，SSD目标检测 ，.ipynb为样例代码

在这里插入图片描述

选择ModelArts Upload Files上传.ipynb文件

在这里插入图片描述

选择Kernel环境

在这里插入图片描述

切换至GPU环境，切换成第一个限时免费

在这里插入图片描述

进入昇思MindSpore官网，点击上方的安装

在这里插入图片描述

获取安装命令

在这里插入图片描述

回到Notebook中，在第一块代码前加入命令
在这里插入图片描述

conda update -n base -c defaults conda

在这里插入图片描述

安装MindSpore 2.0 GPU版本

conda install mindspore=2.0.0a0 -c mindspore -c conda-forge

在这里插入图片描述

安装mindvision

pip install mindvision

在这里插入图片描述

安装下载download

pip install download

在这里插入图片描述

二、环境准备

官方库和第三方库的导入
我们首先导入案例依赖的官方库和第三方库。

import time
import os

import numpy as np
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore.ops import composite as C
from mindspore.nn import layer as L
from mindspore.common.initializer import initializer
from mindspore import context, Tensor, Parameter
from mindspore import ParameterTuple
from mindspore.train.callback import Callback
from mindspore.nn.wrap.grad_reducer import DistributedGradReducer
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
from mindspore.train import Model
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.nn import Momentum
from mindspore.common import set_seed

from src.utils.config import config
数据处理
开始实验之前，请确保本地已经安装了Python环境并安装了MindSpore Vision套件。

数据准备
COCO2017是一个广泛应用的数据集，带有边框和像素级背景注释。这些注释可用于场景理解任务，如语义分割，目标检测和图像字幕制作。训练和评估的图像大小为118K和5K。

数据集大小：19G

训练：18G，118,000个图像

评估：1G，5000个图像

注释：241M；包括实例、字幕、人物关键点等

数据格式：图像及JSON文件

注：数据在dataset.py中处理。

首先，你需要下载 coco2017 数据集。

下载完成后，确保你的数据集存放符合如下路径。

!cat datasets.md
.
└─cocodataset
├─annotations
├─instance_train2017.json
└─instance_val2017.json
├─val2017
└─train2017
数据预处理
原始数据集中图像大小不一致，不方便统一读取和检测。我们首先统一图像大小。数据的注释信息保存在json文件中，我们需要读取出来给图像数据加label。

数据增强
在你开始训练模型之前。数据增强对于您的数据集以及创建训练数据和测试数据是必要的。对于coco数据集，你可以使用dataset.py为图像添加label，并将它们转换到MindRecord。MindRecord是一种MindSpore指定的数据格式，可以在某些场景下优化MindSpore的性能。

首先，我们创建MindRecord数据集保存和读取的地址。

from dataset.dataset import create_coco_dataset, data_to_mindrecord_byte_image

def create_mindrecord_dir(prefix, mindrecord_dir):
“”“Create MindRecord Direction.”“”
if not os.path.isdir(mindrecord_dir):
os.makedirs(mindrecord_dir)
if config.dataset == “coco”:
if os.path.isdir(config.data_root):
print(“Create Mindrecord.”)
data_to_mindrecord_byte_image(“coco”, True, prefix)
print(“Create Mindrecord Done, at {}”.format(mindrecord_dir))
else:
raise Exception(“coco_root not exits.”)
else:
if os.path.isdir(config.IMAGE_DIR) and os.path.exists(config.ANNO_PATH):
print(“Create Mindrecord.”)
data_to_mindrecord_byte_image(“other”, True, prefix)
print(“Create Mindrecord Done, at {}”.format(mindrecord_dir))
else:
raise Exception(“IMAGE_DIR or ANNO_PATH not exits.”)
while not os.path.exists(mindrecord_file+“.db”):
time.sleep(5)
然后，加载数据集，调用dataset.py中的create_coco_dataset函数完成数据预处理和数据增强。

Allocating memory Environment

device_target = config.device_target
rank = 0
device_num = 1
context.set_context(mode=context.GRAPH_MODE, device_target=device_target)

print(“Start create dataset!”)

Call the interface for data processing

It will generate mindrecord file in config.mindrecord_dir,

and the file name is MaskRcnn.mindrecord0, 1, … file_num.

prefix = “MaskRcnn.mindrecord”
mindrecord_dir = config.mindrecord_dir
mindrecord_file = os.path.join(mindrecord_dir, prefix + “0”)
if rank == 0 and not os.path.exists(mindrecord_file):
create_mindrecord_dir(prefix, mindrecord_dir)

When create MindDataset, using the fitst mindrecord file,

such as MaskRcnn.mindrecord0.

dataset = create_coco_dataset(mindrecord_file, batch_size=config.batch_size, device_num=device_num, rank_id=rank)
dataset_size = dataset.get_dataset_size()
print("total images num: ", dataset_size)
print(“Create dataset done!”)
Start create dataset!
total images num: 51790
Create dataset done!
数据集可视化
运行以下代码观察数据增强后的图片。可以发现图片经过了旋转处理，并且图片的shape也已经转换为待输入网络的（N，C，H，W）格式，其中N代表样本数量，C代表图片通道，H和W代表图片的高和宽。

import numpy as np
import matplotlib.pyplot as plt

show_data = next(dataset.create_dict_iterator())

show_images = show_data[“image”].asnumpy()
print(f’Image shape: {show_images.shape}')

plt.figure()

展示2张图片供参考

for i in range(1, 3):
plt.subplot(1, 2, i)

# 将图片转换HWC格式
image_trans = np.transpose(show_images[i - 1], (1, 2, 0))
image_trans = np.clip(image_trans, 0, 1)plt.imshow(image_trans[:, :], cmap=None)
plt.xticks(rotation=180)
plt.axis("off")

Image shape: (2, 3, 768, 1280)

构建网络
image1

前文提到Mask RCNN的模型骨干采用ResNet50（原文），通过添加与现有边框检测分支平行的预测目标掩模分支实现扩展Faster R-CNN，完成目标检测。

骨干网络
Mask R-CNN骨干网络的选择：ResNet, VGG, Mobilenet等。本项目中，使用了对ResNet为骨干的Mask RCNN进行了框架迁移。以及扩展了Mobilenet这种轻量级网络。

骨干网络：

Resnet（Deep residual network, ResNet），深度残差神经网络，卷积神经网络历史在具有划时代意义的神经网络。与Alexnet和VGG不同的是，网络结构上就有很大的改变，在大家为了提升卷积神经网络的性能在不断提升网络深度的时候，大家发现随着网络深度的提升，网络的效果变得越来越差，甚至出现了网络的退化问题，80层的网络比30层的效果还差，深度网络存在的梯度消失和爆炸问题越来越严重，这使得训练一个优异的深度学习模型变得更加艰难，在这种情况下，网络残差模块可以有效消除梯度消失和梯度爆炸问题。
image2

Mobilenetv1是一种轻量级的深度卷积网络，MobileNet的基本单元是深度级可分离卷积（depthwise separable convolution），将标准卷积分成两步。第一步 Depthwise convolution（DW）,也即逐通道的卷积，一个卷积核负责一个通道，一个通道只被一个卷积核“滤波”，则卷积核个数和通道数个数相同；第二步，Pointwise convolution（PW），将depthwise convolution得到的结果通过1x1卷积，再“串”起来。这样其实整体效果和一个标准卷积是差不多的，但是会大大减少计算量和模型参数量。其网络结构如下。
image3

原文中，使用Resnet为骨干网络。这里，我们也选择Resnet50作为骨干网络执行案例。

import numpy as np
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.common.tensor import Tensor
from mindspore.ops import functional as F

ms_cast_type = mstype.float32

def weight_init_ones(shape):
“”"
Weight init.

Args:shape(List): weights shape.Returns:Tensor, weights, default float32.
"""
return Tensor(np.array(np.ones(shape).astype(np.float32) * 0.01).astype(np.float32))

def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode=‘pad’):
“”"
Conv2D wrapper.

Args:in_channels (int): The channel number of the input tensor of the Conv2d layer.out_channels (int): The channel number of the output tensor of the Conv2d layer.kernel_size (Union[int, tuple[int]]): Specifies the height and width of the 2D convolution kernel.The data type is an integer or a tuple of two integers. An integer represents the heightand width of the convolution kernel. A tuple of two integers represents the heightand width of the convolution kernel respectively. Default: 3.stride (Union[int, tuple[int]]): The movement stride of the 2D convolution kernel.The data type is an integer or a tuple of two integers. An integer represents the movement step sizein both height and width directions. A tuple of two integers represents the movement step size in the heightand width directions respectively. Default: 1.padding (Union[int, tuple[int]]): The number of padding on the height and width directions of the input.The data type is an integer or a tuple of four integers. If `padding` is an integer,then the top, bottom, left, and right padding are all equal to `padding`.If `padding` is a tuple of 4 integers, then the top, bottom, left, and right paddingis equal to `padding[0]`, `padding[1]`, `padding[2]`, and `padding[3]` respectively.The value should be greater than or equal to 0. Default: 0.pad_mode (str): Specifies padding mode. The optional values are"same", "valid", "pad". Default: "pad".Outputs:Tensor, math '(N, C_{out}, H_{out}, W_{out})' or math '(N, H_{out}, W_{out}, C_{out})'.
"""
shape = (out_channels, in_channels, kernel_size, kernel_size)
weights = weight_init_ones(shape)
return nn.Conv2d(in_channels, out_channels,kernel_size=kernel_size, stride=stride, padding=padding,pad_mode=pad_mode, weight_init=weights, has_bias=False).to_float(ms_cast_type)

def _batch_norm2d_init(out_chls, momentum=0.1, affine=True, use_batch_statistics=True):
“”"
Batchnorm2D wrapper.

Args:out_cls (int): The number of channels of the input tensor. Expected input size is (N, C, H, W),`C` represents the number of channelsmomentum (float): A floating hyperparameter of the momentum for therunning_mean and running_var computation. Default: 0.1.affine (bool): A bool value. When set to True, gamma and beta can be learned. Default: True.use_batch_statistics (bool):- If true, use the mean value and variance value of current batch data and track running meanand running variance. Default: True.- If false, use the mean value and variance value of specified value, and not track statistical value.- If None, the use_batch_statistics is automatically set to true or false according to the trainingand evaluation mode. During training, the parameter is set to true, and during evaluation, theparameter is set to false.
Outputs:Tensor, the normalized, scaled, offset tensor, of shape :math:'(N, C_{out}, H_{out}, W_{out})'.
"""
gamma_init = Tensor(np.array(np.ones(out_chls)).astype(np.float32))
beta_init = Tensor(np.array(np.ones(out_chls) * 0).astype(np.float32))
moving_mean_init = Tensor(np.array(np.ones(out_chls) * 0).astype(np.float32))
moving_var_init = Tensor(np.array(np.ones(out_chls)).astype(np.float32))return nn.BatchNorm2d(out_chls, momentum=momentum, affine=affine, gamma_init=gamma_init,beta_init=beta_init, moving_mean_init=moving_mean_init,moving_var_init=moving_var_init,use_batch_statistics=use_batch_statistics)

class ResNetFea(nn.Cell):
“”"
ResNet architecture.

Args:block (Cell): Block for network.layer_nums (list): Numbers of block in different layers.in_channels (list): Input channel in each layer.out_channels (list): Output channel in each layer.weights_update (bool): Weight update flag.Inputs:- **x** (Cell) - Input block.Outputs:Cell, output block.Support Plarforms:``Ascend`` ``CPU`` ``GPU``Examples:>>> ResNetFea(ResidualBlockUsing, [3, 4, 6, 3], [64, 256, 512, 1024], [256, 512, 1024, 2048], False)
"""
def __init__(self, block, layer_nums, in_channels, out_channels, weights_update=False):super(ResNetFea, self).__init__()if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:raise ValueError("the length of ""layer_num, inchannel, outchannel list must be 4!")bn_training = Falseself.conv1 = _conv(3, 64, kernel_size=7, stride=2, padding=3, pad_mode='pad')self.bn1 = _batch_norm2d_init(64, affine=bn_training, use_batch_statistics=bn_training)self.relu = P.ReLU()self.maxpool = P.MaxPool(kernel_size=3, strides=2, pad_mode="SAME")self.weights_update = weights_updateif not self.weights_update:self.conv1.weight.requires_grad = Falseself.layer1 = self._make_layer(block, layer_nums[0], in_channel=in_channels[0],out_channel=out_channels[0], stride=1, training=bn_training,weights_update=self.weights_update)self.layer2 = self._make_layer(block, layer_nums[1], in_channel=in_channels[1],out_channel=out_channels[1], stride=2,training=bn_training, weights_update=True)self.layer3 = self._make_layer(block, layer_nums[2], in_channel=in_channels[2],out_channel=out_channels[2], stride=2,training=bn_training, weights_update=True)self.layer4 = self._make_layer(block, layer_nums[3], in_channel=in_channels[3],out_channel=out_channels[3], stride=2,training=bn_training, weights_update=True)def _make_layer(self, block, layer_num, in_channel, out_channel, stride, training=False, weights_update=False):"""Make layer for resnet backbone.Args:block (Cell): ResNet block.layer_num (int): Layer number.in_channel (int): Input channel.out_channel (int): Output channel.stride (int): Stride size for convolutional layer.training(bool): Whether to do training. Default: False.weights_update(bool): Whether to update weights. Default: False.Returns:SequentialCell, Combine several layers toghter.Examples:>>> _make_layer(InvertedResidual, 4, 64, 64, 1)"""layers = []down_sample = Falseif stride != 1 or in_channel != out_channel:down_sample = Trueresblk = block(in_channel, out_channel, stride=stride, down_sample=down_sample,training=training, weights_update=weights_update)layers.append(resblk)for _ in range(1, layer_num):resblk = block(out_channel, out_channel, stride=1, training=training, weights_update=weights_update)layers.append(resblk)return nn.SequentialCell(layers)def construct(self, x):"""Construct ResNet architecture."""x = self.conv1(x)x = self.bn1(x)x = self.relu(x)c1 = self.maxpool(x)c2 = self.layer1(c1)identity = c2if not self.weights_update:identity = F.stop_gradient(c2)c3 = self.layer2(identity)c4 = self.layer3(c3)c5 = self.layer4(c4)return identity, c3, c4, c5

class ResidualBlockUsing(nn.Cell):
“”"
ResNet V1 residual block definition.

Args:in_channels (int): Input channel.out_channels (int): Output channel.stride (int): Stride size for the initial convolutional layer. Default: 1.down_sample (bool): If to do the downsample in block. Default: False.momentum (float): Momentum for batchnorm layer. Default: 0.1.training (bool): Training flag. Default: False.weights_updata (bool): Weights update flag. Default: False.Inputs:- **x** (Cell) - Input block.Outputs:Cell, output block.Support Plarforms:``Ascend`` ``CPU`` ``GPU``Examples:ResidualBlockUsing(3, 256, stride=2, down_sample=True)
"""
expansion = 4def __init__(self, in_channels, out_channels, stride=1, down_sample=False,momentum=0.1, training=False, weights_update=False):super(ResidualBlockUsing, self).__init__()self.affine = weights_updateout_chls = out_channels // self.expansionself.conv1 = _conv(in_channels, out_chls, kernel_size=1, stride=1, padding=0)self.bn1 = _batch_norm2d_init(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)self.conv2 = _conv(out_chls, out_chls, kernel_size=3, stride=stride, padding=1)self.bn2 = _batch_norm2d_init(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)self.conv3 = _conv(out_chls, out_channels, kernel_size=1, stride=1, padding=0)self.bn3 = _batch_norm2d_init(out_channels, momentum=momentum, affine=self.affine,use_batch_statistics=training)if training:self.bn1 = self.bn1.set_train()self.bn2 = self.bn2.set_train()self.bn3 = self.bn3.set_train()if not weights_update:self.conv1.weight.requires_grad = Falseself.conv2.weight.requires_grad = Falseself.conv3.weight.requires_grad = Falseself.relu = P.ReLU()self.downsample = down_sampleif self.downsample:self.conv_down_sample = _conv(in_channels, out_channels, kernel_size=1, stride=stride, padding=0)self.bn_down_sample = _batch_norm2d_init(out_channels, momentum=momentum, affine=self.affine,use_batch_statistics=training)if training:self.bn_down_sample = self.bn_down_sample.set_train()if not weights_update:self.conv_down_sample.weight.requires_grad = Falseself.add = P.Add()def construct(self, x):"""Construct ResNet V1 residual block."""identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample:identity = self.conv_down_sample(identity)identity = self.bn_down_sample(identity)out = self.add(out, identity)out = self.relu(out)return out

FPN网络
FPN网络（Feature Pyramid Network）同时利用低层特征高分辨率和高层特征的高语义信息，通过融合这些不同层的特征达到预测的效果。并且预测是在每个融合后的特征层上单独进行的，这和常规的特征融合方式不同。

骨干网络和FPN网络结合构成了Mask RCNN网络的卷积层。

def bias_init_zeros(shape):
“”“Bias init method.”“”
result = Tensor(np.array(np.zeros(shape).astype(np.float32)), dtype=mstype.float32)
return result

def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode=‘pad’):
“”"
Conv2D wrapper.

Args:in_channels(int): Input channel num.out_channels(int): Output channel num.kernel_size(int): Kernel size. Default: 1.stride(int): Stride. Default: 1.padding(int): Padding range. Default: 0.pad_mode(bool): Padding model. Default: 'pad'.gain(int): Gain. Default: 1.Returns:Tensor, Convoluted result.
"""
shape = (out_channels, in_channels, kernel_size, kernel_size)
weights = initializer("XavierUniform", shape=shape, dtype=mstype.float32)
shape_bias = (out_channels,)
biass = bias_init_zeros(shape_bias)
return nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding,pad_mode=pad_mode, weight_init=weights, has_bias=True, bias_init=biass)

class FeatPyramidNeck(nn.Cell):
“”"
Feature pyramid network cell, usually uses as network neck.

Applies the convolution on multiple, input feature maps
and output feature map with same channel size. if required num of
output larger then num of inputs, add extra maxpooling for further
downsampling;Args:in_channels (tuple): Channel size of input feature maps.out_channels (int): Channel size output.num_outs (int): Num of output features.
Inputs:- **x** (Tensor) - Input variantOutputs:Tuple, with tensors of same channel size.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> neck = FeatPyramidNeck([100,200,300], 50, 4)>>> input_data = (normal(0,0.1,(1,c,1280//(4*2**i), 768//(4*2**i)),...               dtype=np.float32) for i, c in enumerate(config.fpn_in_channels))>>> out = neck(input_data)
"""def __init__(self,in_channels,out_channels,num_outs):super(FeatPyramidNeck, self).__init__()self.cast_type = mstype.float32self.num_outs = num_outsself.in_channels = in_channelsself.fpn_layer = len(self.in_channels)assert not self.num_outs < len(in_channels)self.lateral_convs_list_ = []self.fpn_convs_ = []for _, channel in enumerate(in_channels):l_conv = _conv(channel, out_channels, kernel_size=1, stride=1, padding=0,pad_mode='valid').to_float(self.cast_type)fpn_conv = _conv(out_channels, out_channels, kernel_size=3, stride=1, padding=0,pad_mode='same').to_float(self.cast_type)self.lateral_convs_list_.append(l_conv)self.fpn_convs_.append(fpn_conv)self.lateral_convs_list = nn.layer.CellList(self.lateral_convs_list_)self.fpn_convs_list = nn.layer.CellList(self.fpn_convs_)self.interpolate1 = P.ResizeBilinear((48, 80))self.interpolate2 = P.ResizeBilinear((96, 160))self.interpolate3 = P.ResizeBilinear((192, 320))self.cast = P.Cast()self.maxpool = P.MaxPool(kernel_size=1, strides=2, pad_mode="same")def construct(self, inputs):"""construction of Feature Pyramid Neck."""layers = ()for i in range(self.fpn_layer):layers += (self.lateral_convs_list[i](inputs[i]),)cast_layers = (layers[3],)cast_layers = \cast_layers + (layers[2] + self.cast(self.interpolate1(cast_layers[self.fpn_layer - 4]), self.cast_type),)cast_layers = \cast_layers + (layers[1] + self.cast(self.interpolate2(cast_layers[self.fpn_layer - 3]), self.cast_type),)cast_layers = \cast_layers + (layers[0] + self.cast(self.interpolate3(cast_layers[self.fpn_layer - 2]), self.cast_type),)layers_arranged = ()for i in range(self.fpn_layer - 1, -1, -1):layers_arranged = layers_arranged + (cast_layers[i],)outs = ()for i in range(self.fpn_layer):outs = outs + (self.fpn_convs_list[i](layers_arranged[i]),)for i in range(self.num_outs - self.fpn_layer):outs = outs + (self.maxpool(outs[3]),)return outs

RPN网络
RPN第一次出现在世人眼中是在Faster RCNN这个结构中，专门用来提取候选框，在RCNN和Fast RCNN等物体检测架构中，用来提取候选框的方法通常是Selective Search，是比较传统的方法，而且比较耗时，在CPU上要2s一张图。所以作者提出RPN，专门用来提取候选框，一方面RPN耗时少，另一方面RPN可以很容易结合到Fast RCNN中，称为一个整体。

RPN网络主要输出项：

ROI：对应在特征层每个特征点产生4k个变量，其中4表示[dy, dx, dh, dw]四个边框平移缩放量。其中k表示4个边框，k=4。

scores：对应在特征层每个特征点产生2k个变量，其中2表示前景和北京概率。其中k表示3个边框，k=3。

from src.model.bbox_assign_sample import BboxAssignSample

class RpnRegClsBlock(nn.Cell):
“”"
Rpn reg cls block for rpn layer

Args:in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.weight_conv (Tensor): Weight init for rpn conv.bias_conv (Tensor): Bias init for rpn conv.weight_cls (Tensor): Weight init for rpn cls conv.bias_cls (Tensor): Bias init for rpn cls conv.weight_reg (Tensor): Weight init for rpn reg conv.bias_reg (Tensor): Bias init for rpn reg conv.Inputs:- **x** (Tensor) - input variantOutputs:Tensor, output tensor.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> x = Tensor(np.array([[[[1., 2.], [3., 4.]]]]), mindspore.float32)>>> weight_conv = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_conv = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> weight_cls = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_cls = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> weight_reg = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_reg = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> rpn = RpnRegClsBlock(2, 2, 4, 4, )>>> rpn = ops.SingleRoIExtractor(2, 2, 0.5, 2, weight_conv, bias_conv,...                              weight_cls, bias_cls, weight_reg, bias_reg)>>> output = rpn(x)
"""
def __init__(self, in_channels, feat_channels, num_anchors, cls_out_channels, weight_conv,bias_conv, weight_cls, bias_cls, weight_reg, bias_reg):super(RpnRegClsBlock, self).__init__()self.rpn_conv = nn.Conv2d(in_channels, feat_channels, kernel_size=3,stride=1, pad_mode='same',has_bias=True, weight_init=weight_conv,bias_init=bias_conv)self.relu = nn.ReLU()self.rpn_cls = nn.Conv2d(feat_channels, num_anchors * cls_out_channels,kernel_size=1, pad_mode='valid',has_bias=True, weight_init=weight_cls,bias_init=bias_cls)self.rpn_reg = nn.Conv2d(feat_channels, num_anchors * 4,kernel_size=1, pad_mode='valid',has_bias=True, weight_init=weight_reg,bias_init=bias_reg)def construct(self, x):"""Construct Rpn reg cls block for rpn layer."""x = self.relu(self.rpn_conv(x))x1 = self.rpn_cls(x)x2 = self.rpn_reg(x)return x1, x2

class RPN(nn.Cell):
“”"
ROI proposal network…

Args:config (dict): Config.batch_size (int): Batchsize.in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.Inputs:- **inputs** (Tensor) - Input variant.- **img_metas** (Tensor) - Img shape.- **anchor_list** (Tensor) - A list of anchors.- **gt_bboxes** (Tensor) - Ground truth bounding boxes.- **gt_labels** (Tensor) - Ground truth labels.- **gt_valids** (Tensor) - Ground truth validations.Outputs:Tuple, tuple of output tensor.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> RPN(config=config, batch_size=2, in_channels=256, feat_channels=1024,...     num_anchors=3, cls_out_channels=512)
"""
def __init__(self, config, batch_size, in_channels, feat_channels, num_anchors, cls_out_channels):super(RPN, self).__init__()cfg_rpn = configself.cast_type = mstype.float32self.np_cast_type = np.float32self.num_bboxes = cfg_rpn.num_bboxesself.slice_index = ()self.feature_anchor_shape = ()self.slice_index += (0,)index = 0for shape in cfg_rpn.feature_shapes:self.slice_index += (self.slice_index[index] + shape[0] * shape[1] * num_anchors,)self.feature_anchor_shape += (shape[0] * shape[1] * num_anchors * batch_size,)index += 1self.num_anchors = num_anchorsself.batch_size = batch_sizeself.test_batch_size = cfg_rpn.test_batch_sizeself.num_layers = 5self.real_ratio = Tensor(np.ones((1, 1)).astype(self.np_cast_type))self.rpn_convs_list = nn.layer.CellList(self._make_rpn_layer(self.num_layers, in_channels, feat_channels,num_anchors, cls_out_channels))self.transpose = P.Transpose()self.reshape = P.Reshape()self.concat = P.Concat(axis=0)self.fill = P.Fill()self.placeh1 = Tensor(np.ones((1,)).astype(self.np_cast_type))self.trans_shape = (0, 2, 3, 1)self.reshape_shape_reg = (-1, 4)self.reshape_shape_cls = (-1,)self.rpn_loss_reg_weight = Tensor(np.array(cfg_rpn.rpn_loss_reg_weight).astype(self.np_cast_type))self.rpn_loss_cls_weight = Tensor(np.array(cfg_rpn.rpn_loss_cls_weight).astype(self.np_cast_type))expected_total_size = cfg_rpn.num_expected_neg * self.batch_sizeself.num_expected_total = Tensor(np.array(expected_total_size).astype(self.np_cast_type))self.num_bboxes = cfg_rpn.num_bboxesself.get_targets = BboxAssignSample(cfg_rpn, self.batch_size, self.num_bboxes, False)self.check_valid = P.CheckValid()self.sum_loss = P.ReduceSum()self.loss_cls = P.SigmoidCrossEntropyWithLogits()self.loss_bbox = P.SmoothL1Loss(beta=1.0/9.0)self.squeeze = P.Squeeze()self.cast = P.Cast()self.tile = P.Tile()self.zeros_like = P.ZerosLike()self.loss = Tensor(np.zeros((1,)).astype(self.np_cast_type))self.clsloss = Tensor(np.zeros((1,)).astype(self.np_cast_type))self.regloss = Tensor(np.zeros((1,)).astype(self.np_cast_type))def _make_rpn_layer(self, num_layers, in_channels,feat_channels, num_anchors, cls_out_channels):"""Make rpn layer for rpn proposal networkArgs:num_layers (int): layer num.in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.Returns:List, list of RpnRegClsBlock cells."""rpn_layer = []shp_weight_conv = (feat_channels, in_channels, 3, 3)shp_bias_conv = (feat_channels,)weight_conv = initializer('Normal', shape=shp_weight_conv, dtype=mstype.float32)bias_conv = initializer(0, shape=shp_bias_conv, dtype=mstype.float32)shp_weight_cls = (num_anchors * cls_out_channels, feat_channels, 1, 1)shp_bias_cls = (num_anchors * cls_out_channels,)weight_cls = initializer('Normal', shape=shp_weight_cls, dtype=mstype.float32)bias_cls = initializer(0, shape=shp_bias_cls, dtype=mstype.float32)shp_weight_reg = (num_anchors * 4, feat_channels, 1, 1)shp_bias_reg = (num_anchors * 4,)weight_reg = initializer('Normal', shape=shp_weight_reg, dtype=mstype.float32)bias_reg = initializer(0, shape=shp_bias_reg, dtype=mstype.float32)for i in range(num_layers):rpn_layer.append(RpnRegClsBlock(in_channels, feat_channels, num_anchors, cls_out_channels, weight_conv,bias_conv, weight_cls, bias_cls, weight_reg,bias_reg).to_float(self.cast_type))for i in range(1, num_layers):rpn_layer[i].rpn_conv.weight = rpn_layer[0].rpn_conv.weightrpn_layer[i].rpn_cls.weight = rpn_layer[0].rpn_cls.weightrpn_layer[i].rpn_reg.weight = rpn_layer[0].rpn_reg.weightrpn_layer[i].rpn_conv.bias = rpn_layer[0].rpn_conv.biasrpn_layer[i].rpn_cls.bias = rpn_layer[0].rpn_cls.biasrpn_layer[i].rpn_reg.bias = rpn_layer[0].rpn_reg.biasreturn rpn_layerdef construct(self, inputs, img_metas, anchor_list, gt_bboxes, gt_labels, gt_valids):"""Construct ROI Proposal Network."""loss_print = ()rpn_cls_score = ()rpn_bbox_pred = ()rpn_cls_score_total = ()rpn_bbox_pred_total = ()for i in range(self.num_layers):x1, x2 = self.rpn_convs_list[i](inputs[i])rpn_cls_score_total = rpn_cls_score_total + (x1,)rpn_bbox_pred_total = rpn_bbox_pred_total + (x2,)x1 = self.transpose(x1, self.trans_shape)x1 = self.reshape(x1, self.reshape_shape_cls)x2 = self.transpose(x2, self.trans_shape)x2 = self.reshape(x2, self.reshape_shape_reg)rpn_cls_score = rpn_cls_score + (x1,)rpn_bbox_pred = rpn_bbox_pred + (x2,)loss = self.lossclsloss = self.clslossregloss = self.reglossbbox_targets = ()bbox_weights = ()labels = ()label_weights = ()output = ()if self.training:for i in range(self.batch_size):multi_level_flags = ()anchor_list_tuple = ()for j in range(self.num_layers):res = self.cast(self.check_valid(anchor_list[j], self.squeeze(img_metas[i:i + 1:1, ::])),mstype.int32)multi_level_flags = multi_level_flags + (res,)anchor_list_tuple = anchor_list_tuple + (anchor_list[j],)valid_flag_list = self.concat(multi_level_flags)anchor_using_list = self.concat(anchor_list_tuple)gt_bboxes_i = self.squeeze(gt_bboxes[i:i + 1:1, ::])gt_labels_i = self.squeeze(gt_labels[i:i + 1:1, ::])gt_valids_i = self.squeeze(gt_valids[i:i + 1:1, ::])bbox_target, bbox_weight, label, label_weight = \self.get_targets(gt_bboxes_i, gt_labels_i, self.cast(valid_flag_list, mstype.bool_),anchor_using_list, gt_valids_i)bbox_weight = self.cast(bbox_weight, self.cast_type)label = self.cast(label, self.cast_type)label_weight = self.cast(label_weight, self.cast_type)for j in range(self.num_layers):begin = self.slice_index[j]end = self.slice_index[j + 1]stride = 1bbox_targets += (bbox_target[begin:end:stride, ::],)bbox_weights += (bbox_weight[begin:end:stride],)labels += (label[begin:end:stride],)label_weights += (label_weight[begin:end:stride],)for i in range(self.num_layers):bbox_target_using = ()bbox_weight_using = ()label_using = ()label_weight_using = ()for j in range(self.batch_size):bbox_target_using += (bbox_targets[i + (self.num_layers * j)],)bbox_weight_using += (bbox_weights[i + (self.num_layers * j)],)label_using += (labels[i + (self.num_layers * j)],)label_weight_using += (label_weights[i + (self.num_layers * j)],)bbox_target_with_batchsize = self.concat(bbox_target_using)bbox_weight_with_batchsize = self.concat(bbox_weight_using)label_with_batchsize = self.concat(label_using)label_weight_with_batchsize = self.concat(label_weight_using)# stopbbox_target_ = F.stop_gradient(bbox_target_with_batchsize)bbox_weight_ = F.stop_gradient(bbox_weight_with_batchsize)label_ = F.stop_gradient(label_with_batchsize)label_weight_ = F.stop_gradient(label_weight_with_batchsize)cls_score_i = rpn_cls_score[i]reg_score_i = rpn_bbox_pred[i]loss_cls = self.loss_cls(cls_score_i, label_)loss_cls_item = loss_cls * label_weight_loss_cls_item = self.sum_loss(loss_cls_item, (0,)) / self.num_expected_totalloss_reg = self.loss_bbox(reg_score_i, bbox_target_)bbox_weight_ = self.tile(self.reshape(bbox_weight_, (self.feature_anchor_shape[i], 1)), (1, 4))loss_reg = loss_reg * bbox_weight_loss_reg_item = self.sum_loss(loss_reg, (1,))loss_reg_item = self.sum_loss(loss_reg_item, (0,)) / self.num_expected_totalloss_total = self.rpn_loss_cls_weight * loss_cls_item + self.rpn_loss_reg_weight * loss_reg_itemloss += loss_totalloss_print += (loss_total, loss_cls_item, loss_reg_item)clsloss += loss_cls_itemregloss += loss_reg_itemoutput = (loss, rpn_cls_score_total, rpn_bbox_pred_total,clsloss, regloss, loss_print)else:output = (self.placeh1, rpn_cls_score_total, rpn_bbox_pred_total,self.placeh1, self.placeh1, self.placeh1)return output

ROI Align
ROI Align可以计算不同proposal对应到不同尺度下的特征，利用proposal对该特征进行剪裁、resize、pooling提取特征。

Mask-RCNN中使用的ROI Level校准:

解释
由于Mask R-CNN训练数据的box和anchor都做了调整，所以ROI Level的计算部分也需要
。其中，224应为输入图像尺寸的一半。

计算得到的k即为ROI对应的level，level一共4个:

表示映射回特征
，大小为原输入图像的
。

image4

虚线网格表示特征图，实线表示RoI(在本例中为2×2个bin)，点表示每个容器中的4个采样点。RoIAlign通过双线性插值从特征图上附近的网格点(最近的4个)计算每个采样点的值。在ROI、4个bin或采样点中涉及的任何坐标上都不进行量化。