华为开源自研AI框架昇思MindSpore应用案例:FCN图像语义分割

Mask R-CNN
MaskRCNN是一种概念简单、灵活、通用的目标实例分割框架,在检测出图像中目标的同时,还为每一个实例生成高质量掩码。这种称为Mask R-CNN的方法,通过添加与现有边框检测分支平行的预测目标掩码分支,达到扩展Faster R-CNN的目的。Mask R-CNN训练简单,运行速度达5fps,与Faster R-CNN相比,开销只有小幅上涨。此外,Mask R-CNN易于推广到其他任务。例如,允许在同一框架中预测人体姿势。 Mask R-CNN在COCO挑战赛的三个关键难点上都表现不俗,包括实例分割、边框目标检测和人物关键点检测。Mask R-CNN没有什么华而不实的附加功能,各任务的表现都优于现存所有单模型,包括COCO 2016挑战赛的胜出模型。

模型简介
MaskRCNN是一个两级目标检测网络,作为FasterRCNN的扩展模型,在现有的边框检测分支的基础上增加了一个预测目标掩码的分支。该网络采用区域候选网络(RPN),可与检测网络共享整个图像 的卷积特征,无需任何代价就可轻松计算候选区域。整个网络通过共享卷积特征,将RPN和掩码分支合并为一个网络。其模型骨干还可以选择轻量级网络Mobilenet。

如果你对MindSpore感兴趣,可以关注昇思MindSpore社区

在这里插入图片描述

在这里插入图片描述

一、环境准备

1.进入ModelArts官网

云平台帮助用户快速创建和部署模型,管理全周期AI工作流,选择下面的云平台以开始使用昇思MindSpore,获取安装命令,安装MindSpore2.0.0-alpha版本,可以在昇思教程中进入ModelArts官网

在这里插入图片描述

选择下方CodeLab立即体验

在这里插入图片描述

等待环境搭建完成

在这里插入图片描述

2.使用CodeLab体验Notebook实例

下载NoteBook样例代码SSD目标检测.ipynb为样例代码

在这里插入图片描述

选择ModelArts Upload Files上传.ipynb文件

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

选择Kernel环境

在这里插入图片描述

切换至GPU环境,切换成第一个限时免费

在这里插入图片描述

进入昇思MindSpore官网,点击上方的安装

在这里插入图片描述

获取安装命令

在这里插入图片描述

回到Notebook中,在第一块代码前加入命令
在这里插入图片描述

conda update -n base -c defaults conda

在这里插入图片描述

安装MindSpore 2.0 GPU版本

conda install mindspore=2.0.0a0 -c mindspore -c conda-forge

在这里插入图片描述

安装mindvision

pip install mindvision

在这里插入图片描述

安装下载download

pip install download

在这里插入图片描述

二、环境准备

官方库和第三方库的导入
我们首先导入案例依赖的官方库和第三方库。

import time
import os

import numpy as np
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore.ops import composite as C
from mindspore.nn import layer as L
from mindspore.common.initializer import initializer
from mindspore import context, Tensor, Parameter
from mindspore import ParameterTuple
from mindspore.train.callback import Callback
from mindspore.nn.wrap.grad_reducer import DistributedGradReducer
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
from mindspore.train import Model
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.nn import Momentum
from mindspore.common import set_seed

from src.utils.config import config
数据处理
开始实验之前,请确保本地已经安装了Python环境并安装了MindSpore Vision套件。

数据准备
COCO2017是一个广泛应用的数据集,带有边框和像素级背景注释。这些注释可用于场景理解任务,如语义分割,目标检测和图像字幕制作。训练和评估的图像大小为118K和5K。

数据集大小:19G

训练:18G,118,000个图像

评估:1G,5000个图像

注释:241M;包括实例、字幕、人物关键点等

数据格式:图像及JSON文件

注:数据在dataset.py中处理。

首先,你需要下载 coco2017 数据集。

下载完成后,确保你的数据集存放符合如下路径。

!cat datasets.md
.
└─cocodataset
├─annotations
├─instance_train2017.json
└─instance_val2017.json
├─val2017
└─train2017
数据预处理
原始数据集中图像大小不一致,不方便统一读取和检测。我们首先统一图像大小。数据的注释信息保存在json文件中,我们需要读取出来给图像数据加label。

数据增强
在你开始训练模型之前。数据增强对于您的数据集以及创建训练数据和测试数据是必要的。对于coco数据集,你可以使用dataset.py为图像添加label,并将它们转换到MindRecord。MindRecord是一种MindSpore指定的数据格式,可以在某些场景下优化MindSpore的性能。

首先,我们创建MindRecord数据集保存和读取的地址。

from dataset.dataset import create_coco_dataset, data_to_mindrecord_byte_image

def create_mindrecord_dir(prefix, mindrecord_dir):
“”“Create MindRecord Direction.”“”
if not os.path.isdir(mindrecord_dir):
os.makedirs(mindrecord_dir)
if config.dataset == “coco”:
if os.path.isdir(config.data_root):
print(“Create Mindrecord.”)
data_to_mindrecord_byte_image(“coco”, True, prefix)
print(“Create Mindrecord Done, at {}”.format(mindrecord_dir))
else:
raise Exception(“coco_root not exits.”)
else:
if os.path.isdir(config.IMAGE_DIR) and os.path.exists(config.ANNO_PATH):
print(“Create Mindrecord.”)
data_to_mindrecord_byte_image(“other”, True, prefix)
print(“Create Mindrecord Done, at {}”.format(mindrecord_dir))
else:
raise Exception(“IMAGE_DIR or ANNO_PATH not exits.”)
while not os.path.exists(mindrecord_file+“.db”):
time.sleep(5)
然后,加载数据集,调用dataset.py中的create_coco_dataset函数完成数据预处理和数据增强。

Allocating memory Environment

device_target = config.device_target
rank = 0
device_num = 1
context.set_context(mode=context.GRAPH_MODE, device_target=device_target)

print(“Start create dataset!”)

Call the interface for data processing

It will generate mindrecord file in config.mindrecord_dir,

and the file name is MaskRcnn.mindrecord0, 1, … file_num.

prefix = “MaskRcnn.mindrecord”
mindrecord_dir = config.mindrecord_dir
mindrecord_file = os.path.join(mindrecord_dir, prefix + “0”)
if rank == 0 and not os.path.exists(mindrecord_file):
create_mindrecord_dir(prefix, mindrecord_dir)

When create MindDataset, using the fitst mindrecord file,

such as MaskRcnn.mindrecord0.

dataset = create_coco_dataset(mindrecord_file, batch_size=config.batch_size, device_num=device_num, rank_id=rank)
dataset_size = dataset.get_dataset_size()
print("total images num: ", dataset_size)
print(“Create dataset done!”)
Start create dataset!
total images num: 51790
Create dataset done!
数据集可视化
运行以下代码观察数据增强后的图片。可以发现图片经过了旋转处理,并且图片的shape也已经转换为待输入网络的(N,C,H,W)格式,其中N代表样本数量,C代表图片通道,H和W代表图片的高和宽。

import numpy as np
import matplotlib.pyplot as plt

show_data = next(dataset.create_dict_iterator())

show_images = show_data[“image”].asnumpy()
print(f’Image shape: {show_images.shape}')

plt.figure()

展示2张图片供参考

for i in range(1, 3):
plt.subplot(1, 2, i)

# 将图片转换HWC格式
image_trans = np.transpose(show_images[i - 1], (1, 2, 0))
image_trans = np.clip(image_trans, 0, 1)plt.imshow(image_trans[:, :], cmap=None)
plt.xticks(rotation=180)
plt.axis("off")

Image shape: (2, 3, 768, 1280)

构建网络
image1

前文提到Mask RCNN的模型骨干采用ResNet50(原文),通过添加与现有边框检测分支平行的预测目标掩模分支实现扩展Faster R-CNN,完成目标检测。

骨干网络
Mask R-CNN骨干网络的选择:ResNet, VGG, Mobilenet等。本项目中,使用了对ResNet为骨干的Mask RCNN进行了框架迁移。以及扩展了Mobilenet这种轻量级网络。

骨干网络:

Resnet(Deep residual network, ResNet),深度残差神经网络,卷积神经网络历史在具有划时代意义的神经网络。与Alexnet和VGG不同的是,网络结构上就有很大的改变,在大家为了提升卷积神经网络的性能在不断提升网络深度的时候,大家发现随着网络深度的提升,网络的效果变得越来越差,甚至出现了网络的退化问题,80层的网络比30层的效果还差,深度网络存在的梯度消失和爆炸问题越来越严重,这使得训练一个优异的深度学习模型变得更加艰难,在这种情况下,网络残差模块可以有效消除梯度消失和梯度爆炸问题。
image2

Mobilenetv1是一种轻量级的深度卷积网络,MobileNet的基本单元是深度级可分离卷积(depthwise separable convolution),将标准卷积分成两步。第一步 Depthwise convolution(DW),也即逐通道的卷积,一个卷积核负责一个通道,一个通道只被一个卷积核“滤波”,则卷积核个数和通道数个数相同;第二步,Pointwise convolution(PW),将depthwise convolution得到的结果通过1x1卷积,再“串”起来。这样其实整体效果和一个标准卷积是差不多的,但是会大大减少计算量和模型参数量。其网络结构如下。
image3

原文中,使用Resnet为骨干网络。这里,我们也选择Resnet50作为骨干网络执行案例。

import numpy as np
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.common.tensor import Tensor
from mindspore.ops import functional as F

ms_cast_type = mstype.float32

def weight_init_ones(shape):
“”"
Weight init.

Args:shape(List): weights shape.Returns:Tensor, weights, default float32.
"""
return Tensor(np.array(np.ones(shape).astype(np.float32) * 0.01).astype(np.float32))

def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode=‘pad’):
“”"
Conv2D wrapper.

Args:in_channels (int): The channel number of the input tensor of the Conv2d layer.out_channels (int): The channel number of the output tensor of the Conv2d layer.kernel_size (Union[int, tuple[int]]): Specifies the height and width of the 2D convolution kernel.The data type is an integer or a tuple of two integers. An integer represents the heightand width of the convolution kernel. A tuple of two integers represents the heightand width of the convolution kernel respectively. Default: 3.stride (Union[int, tuple[int]]): The movement stride of the 2D convolution kernel.The data type is an integer or a tuple of two integers. An integer represents the movement step sizein both height and width directions. A tuple of two integers represents the movement step size in the heightand width directions respectively. Default: 1.padding (Union[int, tuple[int]]): The number of padding on the height and width directions of the input.The data type is an integer or a tuple of four integers. If `padding` is an integer,then the top, bottom, left, and right padding are all equal to `padding`.If `padding` is a tuple of 4 integers, then the top, bottom, left, and right paddingis equal to `padding[0]`, `padding[1]`, `padding[2]`, and `padding[3]` respectively.The value should be greater than or equal to 0. Default: 0.pad_mode (str): Specifies padding mode. The optional values are"same", "valid", "pad". Default: "pad".Outputs:Tensor, math '(N, C_{out}, H_{out}, W_{out})' or math '(N, H_{out}, W_{out}, C_{out})'.
"""
shape = (out_channels, in_channels, kernel_size, kernel_size)
weights = weight_init_ones(shape)
return nn.Conv2d(in_channels, out_channels,kernel_size=kernel_size, stride=stride, padding=padding,pad_mode=pad_mode, weight_init=weights, has_bias=False).to_float(ms_cast_type)

def _batch_norm2d_init(out_chls, momentum=0.1, affine=True, use_batch_statistics=True):
“”"
Batchnorm2D wrapper.

Args:out_cls (int): The number of channels of the input tensor. Expected input size is (N, C, H, W),`C` represents the number of channelsmomentum (float): A floating hyperparameter of the momentum for therunning_mean and running_var computation. Default: 0.1.affine (bool): A bool value. When set to True, gamma and beta can be learned. Default: True.use_batch_statistics (bool):- If true, use the mean value and variance value of current batch data and track running meanand running variance. Default: True.- If false, use the mean value and variance value of specified value, and not track statistical value.- If None, the use_batch_statistics is automatically set to true or false according to the trainingand evaluation mode. During training, the parameter is set to true, and during evaluation, theparameter is set to false.
Outputs:Tensor, the normalized, scaled, offset tensor, of shape :math:'(N, C_{out}, H_{out}, W_{out})'.
"""
gamma_init = Tensor(np.array(np.ones(out_chls)).astype(np.float32))
beta_init = Tensor(np.array(np.ones(out_chls) * 0).astype(np.float32))
moving_mean_init = Tensor(np.array(np.ones(out_chls) * 0).astype(np.float32))
moving_var_init = Tensor(np.array(np.ones(out_chls)).astype(np.float32))return nn.BatchNorm2d(out_chls, momentum=momentum, affine=affine, gamma_init=gamma_init,beta_init=beta_init, moving_mean_init=moving_mean_init,moving_var_init=moving_var_init,use_batch_statistics=use_batch_statistics)

class ResNetFea(nn.Cell):
“”"
ResNet architecture.

Args:block (Cell): Block for network.layer_nums (list): Numbers of block in different layers.in_channels (list): Input channel in each layer.out_channels (list): Output channel in each layer.weights_update (bool): Weight update flag.Inputs:- **x** (Cell) - Input block.Outputs:Cell, output block.Support Plarforms:``Ascend`` ``CPU`` ``GPU``Examples:>>> ResNetFea(ResidualBlockUsing, [3, 4, 6, 3], [64, 256, 512, 1024], [256, 512, 1024, 2048], False)
"""
def __init__(self, block, layer_nums, in_channels, out_channels, weights_update=False):super(ResNetFea, self).__init__()if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:raise ValueError("the length of ""layer_num, inchannel, outchannel list must be 4!")bn_training = Falseself.conv1 = _conv(3, 64, kernel_size=7, stride=2, padding=3, pad_mode='pad')self.bn1 = _batch_norm2d_init(64, affine=bn_training, use_batch_statistics=bn_training)self.relu = P.ReLU()self.maxpool = P.MaxPool(kernel_size=3, strides=2, pad_mode="SAME")self.weights_update = weights_updateif not self.weights_update:self.conv1.weight.requires_grad = Falseself.layer1 = self._make_layer(block, layer_nums[0], in_channel=in_channels[0],out_channel=out_channels[0], stride=1, training=bn_training,weights_update=self.weights_update)self.layer2 = self._make_layer(block, layer_nums[1], in_channel=in_channels[1],out_channel=out_channels[1], stride=2,training=bn_training, weights_update=True)self.layer3 = self._make_layer(block, layer_nums[2], in_channel=in_channels[2],out_channel=out_channels[2], stride=2,training=bn_training, weights_update=True)self.layer4 = self._make_layer(block, layer_nums[3], in_channel=in_channels[3],out_channel=out_channels[3], stride=2,training=bn_training, weights_update=True)def _make_layer(self, block, layer_num, in_channel, out_channel, stride, training=False, weights_update=False):"""Make layer for resnet backbone.Args:block (Cell): ResNet block.layer_num (int): Layer number.in_channel (int): Input channel.out_channel (int): Output channel.stride (int): Stride size for convolutional layer.training(bool): Whether to do training. Default: False.weights_update(bool): Whether to update weights. Default: False.Returns:SequentialCell, Combine several layers toghter.Examples:>>> _make_layer(InvertedResidual, 4, 64, 64, 1)"""layers = []down_sample = Falseif stride != 1 or in_channel != out_channel:down_sample = Trueresblk = block(in_channel, out_channel, stride=stride, down_sample=down_sample,training=training, weights_update=weights_update)layers.append(resblk)for _ in range(1, layer_num):resblk = block(out_channel, out_channel, stride=1, training=training, weights_update=weights_update)layers.append(resblk)return nn.SequentialCell(layers)def construct(self, x):"""Construct ResNet architecture."""x = self.conv1(x)x = self.bn1(x)x = self.relu(x)c1 = self.maxpool(x)c2 = self.layer1(c1)identity = c2if not self.weights_update:identity = F.stop_gradient(c2)c3 = self.layer2(identity)c4 = self.layer3(c3)c5 = self.layer4(c4)return identity, c3, c4, c5

class ResidualBlockUsing(nn.Cell):
“”"
ResNet V1 residual block definition.

Args:in_channels (int): Input channel.out_channels (int): Output channel.stride (int): Stride size for the initial convolutional layer. Default: 1.down_sample (bool): If to do the downsample in block. Default: False.momentum (float): Momentum for batchnorm layer. Default: 0.1.training (bool): Training flag. Default: False.weights_updata (bool): Weights update flag. Default: False.Inputs:- **x** (Cell) - Input block.Outputs:Cell, output block.Support Plarforms:``Ascend`` ``CPU`` ``GPU``Examples:ResidualBlockUsing(3, 256, stride=2, down_sample=True)
"""
expansion = 4def __init__(self, in_channels, out_channels, stride=1, down_sample=False,momentum=0.1, training=False, weights_update=False):super(ResidualBlockUsing, self).__init__()self.affine = weights_updateout_chls = out_channels // self.expansionself.conv1 = _conv(in_channels, out_chls, kernel_size=1, stride=1, padding=0)self.bn1 = _batch_norm2d_init(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)self.conv2 = _conv(out_chls, out_chls, kernel_size=3, stride=stride, padding=1)self.bn2 = _batch_norm2d_init(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)self.conv3 = _conv(out_chls, out_channels, kernel_size=1, stride=1, padding=0)self.bn3 = _batch_norm2d_init(out_channels, momentum=momentum, affine=self.affine,use_batch_statistics=training)if training:self.bn1 = self.bn1.set_train()self.bn2 = self.bn2.set_train()self.bn3 = self.bn3.set_train()if not weights_update:self.conv1.weight.requires_grad = Falseself.conv2.weight.requires_grad = Falseself.conv3.weight.requires_grad = Falseself.relu = P.ReLU()self.downsample = down_sampleif self.downsample:self.conv_down_sample = _conv(in_channels, out_channels, kernel_size=1, stride=stride, padding=0)self.bn_down_sample = _batch_norm2d_init(out_channels, momentum=momentum, affine=self.affine,use_batch_statistics=training)if training:self.bn_down_sample = self.bn_down_sample.set_train()if not weights_update:self.conv_down_sample.weight.requires_grad = Falseself.add = P.Add()def construct(self, x):"""Construct ResNet V1 residual block."""identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample:identity = self.conv_down_sample(identity)identity = self.bn_down_sample(identity)out = self.add(out, identity)out = self.relu(out)return out

FPN网络
FPN网络(Feature Pyramid Network)同时利用低层特征高分辨率和高层特征的高语义信息,通过融合这些不同层的特征达到预测的效果。并且预测是在每个融合后的特征层上单独进行的,这和常规的特征融合方式不同。

骨干网络和FPN网络结合构成了Mask RCNN网络的卷积层。

def bias_init_zeros(shape):
“”“Bias init method.”“”
result = Tensor(np.array(np.zeros(shape).astype(np.float32)), dtype=mstype.float32)
return result

def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode=‘pad’):
“”"
Conv2D wrapper.

Args:in_channels(int): Input channel num.out_channels(int): Output channel num.kernel_size(int): Kernel size. Default: 1.stride(int): Stride. Default: 1.padding(int): Padding range. Default: 0.pad_mode(bool): Padding model. Default: 'pad'.gain(int): Gain. Default: 1.Returns:Tensor, Convoluted result.
"""
shape = (out_channels, in_channels, kernel_size, kernel_size)
weights = initializer("XavierUniform", shape=shape, dtype=mstype.float32)
shape_bias = (out_channels,)
biass = bias_init_zeros(shape_bias)
return nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding,pad_mode=pad_mode, weight_init=weights, has_bias=True, bias_init=biass)

class FeatPyramidNeck(nn.Cell):
“”"
Feature pyramid network cell, usually uses as network neck.

Applies the convolution on multiple, input feature maps
and output feature map with same channel size. if required num of
output larger then num of inputs, add extra maxpooling for further
downsampling;Args:in_channels (tuple): Channel size of input feature maps.out_channels (int): Channel size output.num_outs (int): Num of output features.
Inputs:- **x** (Tensor) - Input variantOutputs:Tuple, with tensors of same channel size.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> neck = FeatPyramidNeck([100,200,300], 50, 4)>>> input_data = (normal(0,0.1,(1,c,1280//(4*2**i), 768//(4*2**i)),...               dtype=np.float32) for i, c in enumerate(config.fpn_in_channels))>>> out = neck(input_data)
"""def __init__(self,in_channels,out_channels,num_outs):super(FeatPyramidNeck, self).__init__()self.cast_type = mstype.float32self.num_outs = num_outsself.in_channels = in_channelsself.fpn_layer = len(self.in_channels)assert not self.num_outs < len(in_channels)self.lateral_convs_list_ = []self.fpn_convs_ = []for _, channel in enumerate(in_channels):l_conv = _conv(channel, out_channels, kernel_size=1, stride=1, padding=0,pad_mode='valid').to_float(self.cast_type)fpn_conv = _conv(out_channels, out_channels, kernel_size=3, stride=1, padding=0,pad_mode='same').to_float(self.cast_type)self.lateral_convs_list_.append(l_conv)self.fpn_convs_.append(fpn_conv)self.lateral_convs_list = nn.layer.CellList(self.lateral_convs_list_)self.fpn_convs_list = nn.layer.CellList(self.fpn_convs_)self.interpolate1 = P.ResizeBilinear((48, 80))self.interpolate2 = P.ResizeBilinear((96, 160))self.interpolate3 = P.ResizeBilinear((192, 320))self.cast = P.Cast()self.maxpool = P.MaxPool(kernel_size=1, strides=2, pad_mode="same")def construct(self, inputs):"""construction of Feature Pyramid Neck."""layers = ()for i in range(self.fpn_layer):layers += (self.lateral_convs_list[i](inputs[i]),)cast_layers = (layers[3],)cast_layers = \cast_layers + (layers[2] + self.cast(self.interpolate1(cast_layers[self.fpn_layer - 4]), self.cast_type),)cast_layers = \cast_layers + (layers[1] + self.cast(self.interpolate2(cast_layers[self.fpn_layer - 3]), self.cast_type),)cast_layers = \cast_layers + (layers[0] + self.cast(self.interpolate3(cast_layers[self.fpn_layer - 2]), self.cast_type),)layers_arranged = ()for i in range(self.fpn_layer - 1, -1, -1):layers_arranged = layers_arranged + (cast_layers[i],)outs = ()for i in range(self.fpn_layer):outs = outs + (self.fpn_convs_list[i](layers_arranged[i]),)for i in range(self.num_outs - self.fpn_layer):outs = outs + (self.maxpool(outs[3]),)return outs

RPN网络
RPN第一次出现在世人眼中是在Faster RCNN这个结构中,专门用来提取候选框,在RCNN和Fast RCNN等物体检测架构中,用来提取候选框的方法通常是Selective Search,是比较传统的方法,而且比较耗时,在CPU上要2s一张图。所以作者提出RPN,专门用来提取候选框,一方面RPN耗时少,另一方面RPN可以很容易结合到Fast RCNN中,称为一个整体。

RPN网络主要输出项:

ROI:对应在特征层每个特征点产生4k个变量,其中4表示[dy, dx, dh, dw]四个边框平移缩放量。其中k表示4个边框,k=4。

scores:对应在特征层每个特征点产生2k个变量,其中2表示前景和北京概率。其中k表示3个边框,k=3。

from src.model.bbox_assign_sample import BboxAssignSample

class RpnRegClsBlock(nn.Cell):
“”"
Rpn reg cls block for rpn layer

Args:in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.weight_conv (Tensor): Weight init for rpn conv.bias_conv (Tensor): Bias init for rpn conv.weight_cls (Tensor): Weight init for rpn cls conv.bias_cls (Tensor): Bias init for rpn cls conv.weight_reg (Tensor): Weight init for rpn reg conv.bias_reg (Tensor): Bias init for rpn reg conv.Inputs:- **x** (Tensor) - input variantOutputs:Tensor, output tensor.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> x = Tensor(np.array([[[[1., 2.], [3., 4.]]]]), mindspore.float32)>>> weight_conv = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_conv = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> weight_cls = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_cls = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> weight_reg = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_reg = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> rpn = RpnRegClsBlock(2, 2, 4, 4, )>>> rpn = ops.SingleRoIExtractor(2, 2, 0.5, 2, weight_conv, bias_conv,...                              weight_cls, bias_cls, weight_reg, bias_reg)>>> output = rpn(x)
"""
def __init__(self, in_channels, feat_channels, num_anchors, cls_out_channels, weight_conv,bias_conv, weight_cls, bias_cls, weight_reg, bias_reg):super(RpnRegClsBlock, self).__init__()self.rpn_conv = nn.Conv2d(in_channels, feat_channels, kernel_size=3,stride=1, pad_mode='same',has_bias=True, weight_init=weight_conv,bias_init=bias_conv)self.relu = nn.ReLU()self.rpn_cls = nn.Conv2d(feat_channels, num_anchors * cls_out_channels,kernel_size=1, pad_mode='valid',has_bias=True, weight_init=weight_cls,bias_init=bias_cls)self.rpn_reg = nn.Conv2d(feat_channels, num_anchors * 4,kernel_size=1, pad_mode='valid',has_bias=True, weight_init=weight_reg,bias_init=bias_reg)def construct(self, x):"""Construct Rpn reg cls block for rpn layer."""x = self.relu(self.rpn_conv(x))x1 = self.rpn_cls(x)x2 = self.rpn_reg(x)return x1, x2

class RPN(nn.Cell):
“”"
ROI proposal network…

Args:config (dict): Config.batch_size (int): Batchsize.in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.Inputs:- **inputs** (Tensor) - Input variant.- **img_metas** (Tensor) - Img shape.- **anchor_list** (Tensor) - A list of anchors.- **gt_bboxes** (Tensor) - Ground truth bounding boxes.- **gt_labels** (Tensor) - Ground truth labels.- **gt_valids** (Tensor) - Ground truth validations.Outputs:Tuple, tuple of output tensor.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> RPN(config=config, batch_size=2, in_channels=256, feat_channels=1024,...     num_anchors=3, cls_out_channels=512)
"""
def __init__(self, config, batch_size, in_channels, feat_channels, num_anchors, cls_out_channels):super(RPN, self).__init__()cfg_rpn = configself.cast_type = mstype.float32self.np_cast_type = np.float32self.num_bboxes = cfg_rpn.num_bboxesself.slice_index = ()self.feature_anchor_shape = ()self.slice_index += (0,)index = 0for shape in cfg_rpn.feature_shapes:self.slice_index += (self.slice_index[index] + shape[0] * shape[1] * num_anchors,)self.feature_anchor_shape += (shape[0] * shape[1] * num_anchors * batch_size,)index += 1self.num_anchors = num_anchorsself.batch_size = batch_sizeself.test_batch_size = cfg_rpn.test_batch_sizeself.num_layers = 5self.real_ratio = Tensor(np.ones((1, 1)).astype(self.np_cast_type))self.rpn_convs_list = nn.layer.CellList(self._make_rpn_layer(self.num_layers, in_channels, feat_channels,num_anchors, cls_out_channels))self.transpose = P.Transpose()self.reshape = P.Reshape()self.concat = P.Concat(axis=0)self.fill = P.Fill()self.placeh1 = Tensor(np.ones((1,)).astype(self.np_cast_type))self.trans_shape = (0, 2, 3, 1)self.reshape_shape_reg = (-1, 4)self.reshape_shape_cls = (-1,)self.rpn_loss_reg_weight = Tensor(np.array(cfg_rpn.rpn_loss_reg_weight).astype(self.np_cast_type))self.rpn_loss_cls_weight = Tensor(np.array(cfg_rpn.rpn_loss_cls_weight).astype(self.np_cast_type))expected_total_size = cfg_rpn.num_expected_neg * self.batch_sizeself.num_expected_total = Tensor(np.array(expected_total_size).astype(self.np_cast_type))self.num_bboxes = cfg_rpn.num_bboxesself.get_targets = BboxAssignSample(cfg_rpn, self.batch_size, self.num_bboxes, False)self.check_valid = P.CheckValid()self.sum_loss = P.ReduceSum()self.loss_cls = P.SigmoidCrossEntropyWithLogits()self.loss_bbox = P.SmoothL1Loss(beta=1.0/9.0)self.squeeze = P.Squeeze()self.cast = P.Cast()self.tile = P.Tile()self.zeros_like = P.ZerosLike()self.loss = Tensor(np.zeros((1,)).astype(self.np_cast_type))self.clsloss = Tensor(np.zeros((1,)).astype(self.np_cast_type))self.regloss = Tensor(np.zeros((1,)).astype(self.np_cast_type))def _make_rpn_layer(self, num_layers, in_channels,feat_channels, num_anchors, cls_out_channels):"""Make rpn layer for rpn proposal networkArgs:num_layers (int): layer num.in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.Returns:List, list of RpnRegClsBlock cells."""rpn_layer = []shp_weight_conv = (feat_channels, in_channels, 3, 3)shp_bias_conv = (feat_channels,)weight_conv = initializer('Normal', shape=shp_weight_conv, dtype=mstype.float32)bias_conv = initializer(0, shape=shp_bias_conv, dtype=mstype.float32)shp_weight_cls = (num_anchors * cls_out_channels, feat_channels, 1, 1)shp_bias_cls = (num_anchors * cls_out_channels,)weight_cls = initializer('Normal', shape=shp_weight_cls, dtype=mstype.float32)bias_cls = initializer(0, shape=shp_bias_cls, dtype=mstype.float32)shp_weight_reg = (num_anchors * 4, feat_channels, 1, 1)shp_bias_reg = (num_anchors * 4,)weight_reg = initializer('Normal', shape=shp_weight_reg, dtype=mstype.float32)bias_reg = initializer(0, shape=shp_bias_reg, dtype=mstype.float32)for i in range(num_layers):rpn_layer.append(RpnRegClsBlock(in_channels, feat_channels, num_anchors, cls_out_channels, weight_conv,bias_conv, weight_cls, bias_cls, weight_reg,bias_reg).to_float(self.cast_type))for i in range(1, num_layers):rpn_layer[i].rpn_conv.weight = rpn_layer[0].rpn_conv.weightrpn_layer[i].rpn_cls.weight = rpn_layer[0].rpn_cls.weightrpn_layer[i].rpn_reg.weight = rpn_layer[0].rpn_reg.weightrpn_layer[i].rpn_conv.bias = rpn_layer[0].rpn_conv.biasrpn_layer[i].rpn_cls.bias = rpn_layer[0].rpn_cls.biasrpn_layer[i].rpn_reg.bias = rpn_layer[0].rpn_reg.biasreturn rpn_layerdef construct(self, inputs, img_metas, anchor_list, gt_bboxes, gt_labels, gt_valids):"""Construct ROI Proposal Network."""loss_print = ()rpn_cls_score = ()rpn_bbox_pred = ()rpn_cls_score_total = ()rpn_bbox_pred_total = ()for i in range(self.num_layers):x1, x2 = self.rpn_convs_list[i](inputs[i])rpn_cls_score_total = rpn_cls_score_total + (x1,)rpn_bbox_pred_total = rpn_bbox_pred_total + (x2,)x1 = self.transpose(x1, self.trans_shape)x1 = self.reshape(x1, self.reshape_shape_cls)x2 = self.transpose(x2, self.trans_shape)x2 = self.reshape(x2, self.reshape_shape_reg)rpn_cls_score = rpn_cls_score + (x1,)rpn_bbox_pred = rpn_bbox_pred + (x2,)loss = self.lossclsloss = self.clslossregloss = self.reglossbbox_targets = ()bbox_weights = ()labels = ()label_weights = ()output = ()if self.training:for i in range(self.batch_size):multi_level_flags = ()anchor_list_tuple = ()for j in range(self.num_layers):res = self.cast(self.check_valid(anchor_list[j], self.squeeze(img_metas[i:i + 1:1, ::])),mstype.int32)multi_level_flags = multi_level_flags + (res,)anchor_list_tuple = anchor_list_tuple + (anchor_list[j],)valid_flag_list = self.concat(multi_level_flags)anchor_using_list = self.concat(anchor_list_tuple)gt_bboxes_i = self.squeeze(gt_bboxes[i:i + 1:1, ::])gt_labels_i = self.squeeze(gt_labels[i:i + 1:1, ::])gt_valids_i = self.squeeze(gt_valids[i:i + 1:1, ::])bbox_target, bbox_weight, label, label_weight = \self.get_targets(gt_bboxes_i, gt_labels_i, self.cast(valid_flag_list, mstype.bool_),anchor_using_list, gt_valids_i)bbox_weight = self.cast(bbox_weight, self.cast_type)label = self.cast(label, self.cast_type)label_weight = self.cast(label_weight, self.cast_type)for j in range(self.num_layers):begin = self.slice_index[j]end = self.slice_index[j + 1]stride = 1bbox_targets += (bbox_target[begin:end:stride, ::],)bbox_weights += (bbox_weight[begin:end:stride],)labels += (label[begin:end:stride],)label_weights += (label_weight[begin:end:stride],)for i in range(self.num_layers):bbox_target_using = ()bbox_weight_using = ()label_using = ()label_weight_using = ()for j in range(self.batch_size):bbox_target_using += (bbox_targets[i + (self.num_layers * j)],)bbox_weight_using += (bbox_weights[i + (self.num_layers * j)],)label_using += (labels[i + (self.num_layers * j)],)label_weight_using += (label_weights[i + (self.num_layers * j)],)bbox_target_with_batchsize = self.concat(bbox_target_using)bbox_weight_with_batchsize = self.concat(bbox_weight_using)label_with_batchsize = self.concat(label_using)label_weight_with_batchsize = self.concat(label_weight_using)# stopbbox_target_ = F.stop_gradient(bbox_target_with_batchsize)bbox_weight_ = F.stop_gradient(bbox_weight_with_batchsize)label_ = F.stop_gradient(label_with_batchsize)label_weight_ = F.stop_gradient(label_weight_with_batchsize)cls_score_i = rpn_cls_score[i]reg_score_i = rpn_bbox_pred[i]loss_cls = self.loss_cls(cls_score_i, label_)loss_cls_item = loss_cls * label_weight_loss_cls_item = self.sum_loss(loss_cls_item, (0,)) / self.num_expected_totalloss_reg = self.loss_bbox(reg_score_i, bbox_target_)bbox_weight_ = self.tile(self.reshape(bbox_weight_, (self.feature_anchor_shape[i], 1)), (1, 4))loss_reg = loss_reg * bbox_weight_loss_reg_item = self.sum_loss(loss_reg, (1,))loss_reg_item = self.sum_loss(loss_reg_item, (0,)) / self.num_expected_totalloss_total = self.rpn_loss_cls_weight * loss_cls_item + self.rpn_loss_reg_weight * loss_reg_itemloss += loss_totalloss_print += (loss_total, loss_cls_item, loss_reg_item)clsloss += loss_cls_itemregloss += loss_reg_itemoutput = (loss, rpn_cls_score_total, rpn_bbox_pred_total,clsloss, regloss, loss_print)else:output = (self.placeh1, rpn_cls_score_total, rpn_bbox_pred_total,self.placeh1, self.placeh1, self.placeh1)return output

ROI Align
ROI Align可以计算不同proposal对应到不同尺度下的特征,利用proposal对该特征进行剪裁、resize、pooling提取特征。

Mask-RCNN中使用的ROI Level校准:

解释
由于Mask R-CNN训练数据的box和anchor都做了调整,所以ROI Level的计算部分也需要
。其中,224应为输入图像尺寸的一半。

计算得到的k即为ROI对应的level,level一共4个:

表示映射回特征
,大小为原输入图像的

表示映射回特征
,大小为原输入图像的

表示映射回特征
,大小为原输入图像的

表示映射回特征
,大小为原输入图像的

image4

虚线网格表示特征图,实线表示RoI(在本例中为2×2个bin),点表示每个容器中的4个采样点。RoIAlign通过双线性插值从特征图上附近的网格点(最近的4个)计算每个采样点的值。在ROI、4个bin或采样点中涉及的任何坐标上都不进行量化。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/42743.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

leetcode判断二分图

判断二分图 图的问题肯定要用到深度优先遍历或者广度优先遍历&#xff0c;但又不是单纯的深度优先遍历算法和广度优先遍历算法&#xff0c;而是需要在遍历的过程中加入与解决题目相关的逻辑。 题干中说了&#xff0c;这个图可能不是连通图&#xff0c;这个提示有什么作用呢&a…

shared_ptr 线程安全

为什么 shared_ptr 可以安全地在多个线程中共享&#xff1f; 循环引用 因为shared_ptr std::shared_ptr 的引用计数是线程安全的。这意味着你可以在多个线程中安全地拷贝、赋值和销毁 std::shared_ptr。然而&#xff0c;访问或修改 shared_ptr 所指向的对象时&#xff0c;需要…

昇思25天学习打卡营第20天|LSTM+CRF序列标注

学AI还能赢奖品&#xff1f;每天30分钟&#xff0c;25天打通AI任督二脉 (qq.com) LSTMCRF序列标注 概述 序列标注指给定输入序列&#xff0c;给序列中每个Token进行标注标签的过程。序列标注问题通常用于从文本中进行信息抽取&#xff0c;包括分词(Word Segmentation)、词性标…

明日周刊-第15期

赶在周末结束前输出一把&#xff0c;周日的晚上大家要睡个好觉哦。 文章目录 一周热点资源分享言论歌曲推荐 一周热点 科技创新与基础设施建设 深中通道正式通车试运营 时间&#xff1a;6月30日 内容&#xff1a;国家重大工程深中通道正式通车试运营&#xff0c;标志着珠江口东…

吉时利KEITHLEY KI-488驱动和说明

吉时利KEITHLEY KI-488驱动和说明

[吃瓜教程]南瓜书第6章支持向量机

0.补充知识 0.1 超平面 定义&#xff1a; 超平面是指在&#x1d45b;维空间中&#xff0c;维度为 &#x1d45b;−1的子空间。它是分割空间的一个平面。 性质&#xff1a; n维空间的超平面 ( w T x b 0 , 其中 w , x ∈ R n ) (w^Tx_b0,其中w,x\in \mathbb R^n) (wTxb​0,其…

通过端口转发实现docker容器运行时端口更改

通过端口转发实现docker容器运行时端口更改 前言启动容器查看容器ip地址端口转发 前言 关于修改docker正在运行中容器端口&#xff0c;网上大部分分为3类: 1. 删除原有容器重新创建;2. 改配置文件;3. 在现有容器上新提交镜像&#xff0c;用新镜像起新的容器。 1和3属于同一种流…

探讨4层代理和7层代理行为以及如何获取真实客户端IP

准备工作 实验环境 IP角色192.168.1.100客户端请求IP192.168.1.100python 启动的HTTP服务192.168.1.102nginx服务192.168.1.103haproxy 服务 HTTP服务 这是一个简单的HTTP服务&#xff0c;主要打印HTTP报文用于分析客户端IP #!/usr/bin/env python # coding: utf-8import …

「技术分享」FDL对接金蝶云API取数

很多企业的ERP系统都在用金蝶云星空&#xff0c;金蝶云星空API是IT人员获取数据的重要来源&#xff0c; 常常用来生成定制化报表&#xff0c;进行数据分析&#xff0c;或是将金蝶云的数据与OA系统、BI工具集成。 通常情况下&#xff0c;IT人员需要使用Python、Java等语言编写脚…

44、tomcat安装

一、tomcat tomcat和php一样&#xff0c;都是用来处理动态页面的。 tomcat也可以作为web应用服务器&#xff0c;开源的。 php .php tomcat .jsp nginx .html tomcat 是用Java代码写的程序&#xff0c;运行的是Java的web应用程序。 tomcat的特点和功能&#xff1a; 1、s…

XSS平台的搭建

第一步&#xff1a;安装MySQL 数据库 因为xss平台涉及到使用mysql 数据库&#xff0c;在安装之前&#xff0c;先使用docker 安装mysql 数据库。 docker run --name mysqlserver -e MYSQL_ROOT_PASSWORD123 -d -i -p 3309:3306 mysql:5.6 第二步&#xff1a;安装xssplatform…

hadoop分布式中某个 节点报错的解决案例

前言 在分布式节点中&#xff0c;发现有个节点显示不可用状态&#xff0c;因此需要紧急修复。 hadoop版本 目前这套集群hadoop的版本如下&#xff1a; 集群报错详细日志&#xff1a; 1/1 local-dirs are bad: /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas/n…

离线开发(VSCode、Chrome、Element)

一、VSCode 扩展 使用能联网的电脑 A&#xff0c;在VSCode官网下载安装包 使用能联网的电脑 A&#xff0c;从扩展下载vsix扩展文件 将VSCode安装包和vsix扩展文件通过手段&#xff08;u盘&#xff0c;刻盘 等&#xff09;导入到不能联网的离线电脑 B 中 在离线电脑 B 中安装…

快速解决找不到krpt.dll,无法继续执行代码问题

对于那些遇到计算机开机出现由于无法找到krpt.dll&#xff0c;进而无法继续执行代码问题的用户。 krpt.dll是计算机系统中与DirectX紧密相关的重要文件&#xff0c;如果它出现问题&#xff0c;可能会对一些特定的软件或游戏的运行产生影响。实际上&#xff0c;我们有多种解决该…

无需服务器,浏览器跑700+AI模型?!【送源码】

Transformers.js 是一个创新的网络机器学习库&#xff0c;它将先进的 Transformer 模型直接带入浏览器&#xff0c;无需服务器端支持。这个库与 Hugging Face 的 Python transformers 库功能对等&#xff0c;提供相似的 API 接口来运行预训练模型&#xff0c;涵盖了自然语言处理…

mysql signed unsigned zerofill详解

灵感来源 mysql中有符号signed&#xff0c;无符号unsigned与零填充zerofill UNSIGNED 无符号UNSIGNED是一个属性&#xff0c;你可以在创建或修改表时为整数类型的列指定它。无符号属性意味着该列只能存储非负整数&#xff08;0和正整数&#xff09;&#xff0c;而不是默认的有…

docker部署onlyoffice,开启JWT权限校验Token

原来的部署方式 之前的方式是禁用了JWT&#xff1a; docker run -itd -p 8080:80 --name docserver --network host -e JWT_ENABLEDfalse --restartalways onlyoffice/documentserver:8 新的部署方式 参考文档&#xff1a;https://helpcenter.onlyoffice.com/installation/…

C9联盟是什么?

九校联盟&#xff08;C9 League&#xff09;&#xff0c;简称C9联盟&#xff0c;是中国首个顶尖大学间的高校联盟&#xff0c;于2009年10月正式启动。 其成员都是国家首批“985工程”重点建设的一流大学&#xff0c;包括北京大学、清华大学、哈尔滨工业大学、复旦大学、上海交通…

c++ primer plus 第15章友,异常和其他:15.2.2模板中的嵌套

c primer plus 第15章友&#xff0c;异常和其他&#xff1a;15.2.2模板中的嵌套 15.2.2模板中的嵌套 文章目录 c primer plus 第15章友&#xff0c;异常和其他&#xff1a;15.2.2模板中的嵌套15.2.2模板中的嵌套程序清单15.5 queuetp.h程序清单15.6 nested.cpp 15.2.2模板中的…

五.RocketMQ理论及常见问题处理方案

RocketMQ的架构理论及底层原理 一&#xff1a;生产消息1.消息生产过程2.Queue选择算法 二&#xff1a;存储消息2.1存储介质2.2消息的存储和发送2.3消息存储结构2.4刷盘机制 三&#xff1a;消费消息1 获取消费类型2 消费模式3 Rebalance机制4.Queue分配算法 四&#xff1a;消息清…