华为开源自研AI框架昇思MindSpore应用案例:FCN图像语义分割

Mask R-CNN
MaskRCNN是一种概念简单、灵活、通用的目标实例分割框架,在检测出图像中目标的同时,还为每一个实例生成高质量掩码。这种称为Mask R-CNN的方法,通过添加与现有边框检测分支平行的预测目标掩码分支,达到扩展Faster R-CNN的目的。Mask R-CNN训练简单,运行速度达5fps,与Faster R-CNN相比,开销只有小幅上涨。此外,Mask R-CNN易于推广到其他任务。例如,允许在同一框架中预测人体姿势。 Mask R-CNN在COCO挑战赛的三个关键难点上都表现不俗,包括实例分割、边框目标检测和人物关键点检测。Mask R-CNN没有什么华而不实的附加功能,各任务的表现都优于现存所有单模型,包括COCO 2016挑战赛的胜出模型。

模型简介
MaskRCNN是一个两级目标检测网络,作为FasterRCNN的扩展模型,在现有的边框检测分支的基础上增加了一个预测目标掩码的分支。该网络采用区域候选网络(RPN),可与检测网络共享整个图像 的卷积特征,无需任何代价就可轻松计算候选区域。整个网络通过共享卷积特征,将RPN和掩码分支合并为一个网络。其模型骨干还可以选择轻量级网络Mobilenet。

如果你对MindSpore感兴趣,可以关注昇思MindSpore社区

在这里插入图片描述

在这里插入图片描述

一、环境准备

1.进入ModelArts官网

云平台帮助用户快速创建和部署模型,管理全周期AI工作流,选择下面的云平台以开始使用昇思MindSpore,获取安装命令,安装MindSpore2.0.0-alpha版本,可以在昇思教程中进入ModelArts官网

在这里插入图片描述

选择下方CodeLab立即体验

在这里插入图片描述

等待环境搭建完成

在这里插入图片描述

2.使用CodeLab体验Notebook实例

下载NoteBook样例代码SSD目标检测.ipynb为样例代码

在这里插入图片描述

选择ModelArts Upload Files上传.ipynb文件

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

选择Kernel环境

在这里插入图片描述

切换至GPU环境,切换成第一个限时免费

在这里插入图片描述

进入昇思MindSpore官网,点击上方的安装

在这里插入图片描述

获取安装命令

在这里插入图片描述

回到Notebook中,在第一块代码前加入命令
在这里插入图片描述

conda update -n base -c defaults conda

在这里插入图片描述

安装MindSpore 2.0 GPU版本

conda install mindspore=2.0.0a0 -c mindspore -c conda-forge

在这里插入图片描述

安装mindvision

pip install mindvision

在这里插入图片描述

安装下载download

pip install download

在这里插入图片描述

二、环境准备

官方库和第三方库的导入
我们首先导入案例依赖的官方库和第三方库。

import time
import os

import numpy as np
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore.ops import composite as C
from mindspore.nn import layer as L
from mindspore.common.initializer import initializer
from mindspore import context, Tensor, Parameter
from mindspore import ParameterTuple
from mindspore.train.callback import Callback
from mindspore.nn.wrap.grad_reducer import DistributedGradReducer
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
from mindspore.train import Model
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.nn import Momentum
from mindspore.common import set_seed

from src.utils.config import config
数据处理
开始实验之前,请确保本地已经安装了Python环境并安装了MindSpore Vision套件。

数据准备
COCO2017是一个广泛应用的数据集,带有边框和像素级背景注释。这些注释可用于场景理解任务,如语义分割,目标检测和图像字幕制作。训练和评估的图像大小为118K和5K。

数据集大小:19G

训练:18G,118,000个图像

评估:1G,5000个图像

注释:241M;包括实例、字幕、人物关键点等

数据格式:图像及JSON文件

注:数据在dataset.py中处理。

首先,你需要下载 coco2017 数据集。

下载完成后,确保你的数据集存放符合如下路径。

!cat datasets.md
.
└─cocodataset
├─annotations
├─instance_train2017.json
└─instance_val2017.json
├─val2017
└─train2017
数据预处理
原始数据集中图像大小不一致,不方便统一读取和检测。我们首先统一图像大小。数据的注释信息保存在json文件中,我们需要读取出来给图像数据加label。

数据增强
在你开始训练模型之前。数据增强对于您的数据集以及创建训练数据和测试数据是必要的。对于coco数据集,你可以使用dataset.py为图像添加label,并将它们转换到MindRecord。MindRecord是一种MindSpore指定的数据格式,可以在某些场景下优化MindSpore的性能。

首先,我们创建MindRecord数据集保存和读取的地址。

from dataset.dataset import create_coco_dataset, data_to_mindrecord_byte_image

def create_mindrecord_dir(prefix, mindrecord_dir):
“”“Create MindRecord Direction.”“”
if not os.path.isdir(mindrecord_dir):
os.makedirs(mindrecord_dir)
if config.dataset == “coco”:
if os.path.isdir(config.data_root):
print(“Create Mindrecord.”)
data_to_mindrecord_byte_image(“coco”, True, prefix)
print(“Create Mindrecord Done, at {}”.format(mindrecord_dir))
else:
raise Exception(“coco_root not exits.”)
else:
if os.path.isdir(config.IMAGE_DIR) and os.path.exists(config.ANNO_PATH):
print(“Create Mindrecord.”)
data_to_mindrecord_byte_image(“other”, True, prefix)
print(“Create Mindrecord Done, at {}”.format(mindrecord_dir))
else:
raise Exception(“IMAGE_DIR or ANNO_PATH not exits.”)
while not os.path.exists(mindrecord_file+“.db”):
time.sleep(5)
然后,加载数据集,调用dataset.py中的create_coco_dataset函数完成数据预处理和数据增强。

Allocating memory Environment

device_target = config.device_target
rank = 0
device_num = 1
context.set_context(mode=context.GRAPH_MODE, device_target=device_target)

print(“Start create dataset!”)

Call the interface for data processing

It will generate mindrecord file in config.mindrecord_dir,

and the file name is MaskRcnn.mindrecord0, 1, … file_num.

prefix = “MaskRcnn.mindrecord”
mindrecord_dir = config.mindrecord_dir
mindrecord_file = os.path.join(mindrecord_dir, prefix + “0”)
if rank == 0 and not os.path.exists(mindrecord_file):
create_mindrecord_dir(prefix, mindrecord_dir)

When create MindDataset, using the fitst mindrecord file,

such as MaskRcnn.mindrecord0.

dataset = create_coco_dataset(mindrecord_file, batch_size=config.batch_size, device_num=device_num, rank_id=rank)
dataset_size = dataset.get_dataset_size()
print("total images num: ", dataset_size)
print(“Create dataset done!”)
Start create dataset!
total images num: 51790
Create dataset done!
数据集可视化
运行以下代码观察数据增强后的图片。可以发现图片经过了旋转处理,并且图片的shape也已经转换为待输入网络的(N,C,H,W)格式,其中N代表样本数量,C代表图片通道,H和W代表图片的高和宽。

import numpy as np
import matplotlib.pyplot as plt

show_data = next(dataset.create_dict_iterator())

show_images = show_data[“image”].asnumpy()
print(f’Image shape: {show_images.shape}')

plt.figure()

展示2张图片供参考

for i in range(1, 3):
plt.subplot(1, 2, i)

# 将图片转换HWC格式
image_trans = np.transpose(show_images[i - 1], (1, 2, 0))
image_trans = np.clip(image_trans, 0, 1)plt.imshow(image_trans[:, :], cmap=None)
plt.xticks(rotation=180)
plt.axis("off")

Image shape: (2, 3, 768, 1280)

构建网络
image1

前文提到Mask RCNN的模型骨干采用ResNet50(原文),通过添加与现有边框检测分支平行的预测目标掩模分支实现扩展Faster R-CNN,完成目标检测。

骨干网络
Mask R-CNN骨干网络的选择:ResNet, VGG, Mobilenet等。本项目中,使用了对ResNet为骨干的Mask RCNN进行了框架迁移。以及扩展了Mobilenet这种轻量级网络。

骨干网络:

Resnet(Deep residual network, ResNet),深度残差神经网络,卷积神经网络历史在具有划时代意义的神经网络。与Alexnet和VGG不同的是,网络结构上就有很大的改变,在大家为了提升卷积神经网络的性能在不断提升网络深度的时候,大家发现随着网络深度的提升,网络的效果变得越来越差,甚至出现了网络的退化问题,80层的网络比30层的效果还差,深度网络存在的梯度消失和爆炸问题越来越严重,这使得训练一个优异的深度学习模型变得更加艰难,在这种情况下,网络残差模块可以有效消除梯度消失和梯度爆炸问题。
image2

Mobilenetv1是一种轻量级的深度卷积网络,MobileNet的基本单元是深度级可分离卷积(depthwise separable convolution),将标准卷积分成两步。第一步 Depthwise convolution(DW),也即逐通道的卷积,一个卷积核负责一个通道,一个通道只被一个卷积核“滤波”,则卷积核个数和通道数个数相同;第二步,Pointwise convolution(PW),将depthwise convolution得到的结果通过1x1卷积,再“串”起来。这样其实整体效果和一个标准卷积是差不多的,但是会大大减少计算量和模型参数量。其网络结构如下。
image3

原文中,使用Resnet为骨干网络。这里,我们也选择Resnet50作为骨干网络执行案例。

import numpy as np
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.common.tensor import Tensor
from mindspore.ops import functional as F

ms_cast_type = mstype.float32

def weight_init_ones(shape):
“”"
Weight init.

Args:shape(List): weights shape.Returns:Tensor, weights, default float32.
"""
return Tensor(np.array(np.ones(shape).astype(np.float32) * 0.01).astype(np.float32))

def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode=‘pad’):
“”"
Conv2D wrapper.

Args:in_channels (int): The channel number of the input tensor of the Conv2d layer.out_channels (int): The channel number of the output tensor of the Conv2d layer.kernel_size (Union[int, tuple[int]]): Specifies the height and width of the 2D convolution kernel.The data type is an integer or a tuple of two integers. An integer represents the heightand width of the convolution kernel. A tuple of two integers represents the heightand width of the convolution kernel respectively. Default: 3.stride (Union[int, tuple[int]]): The movement stride of the 2D convolution kernel.The data type is an integer or a tuple of two integers. An integer represents the movement step sizein both height and width directions. A tuple of two integers represents the movement step size in the heightand width directions respectively. Default: 1.padding (Union[int, tuple[int]]): The number of padding on the height and width directions of the input.The data type is an integer or a tuple of four integers. If `padding` is an integer,then the top, bottom, left, and right padding are all equal to `padding`.If `padding` is a tuple of 4 integers, then the top, bottom, left, and right paddingis equal to `padding[0]`, `padding[1]`, `padding[2]`, and `padding[3]` respectively.The value should be greater than or equal to 0. Default: 0.pad_mode (str): Specifies padding mode. The optional values are"same", "valid", "pad". Default: "pad".Outputs:Tensor, math '(N, C_{out}, H_{out}, W_{out})' or math '(N, H_{out}, W_{out}, C_{out})'.
"""
shape = (out_channels, in_channels, kernel_size, kernel_size)
weights = weight_init_ones(shape)
return nn.Conv2d(in_channels, out_channels,kernel_size=kernel_size, stride=stride, padding=padding,pad_mode=pad_mode, weight_init=weights, has_bias=False).to_float(ms_cast_type)

def _batch_norm2d_init(out_chls, momentum=0.1, affine=True, use_batch_statistics=True):
“”"
Batchnorm2D wrapper.

Args:out_cls (int): The number of channels of the input tensor. Expected input size is (N, C, H, W),`C` represents the number of channelsmomentum (float): A floating hyperparameter of the momentum for therunning_mean and running_var computation. Default: 0.1.affine (bool): A bool value. When set to True, gamma and beta can be learned. Default: True.use_batch_statistics (bool):- If true, use the mean value and variance value of current batch data and track running meanand running variance. Default: True.- If false, use the mean value and variance value of specified value, and not track statistical value.- If None, the use_batch_statistics is automatically set to true or false according to the trainingand evaluation mode. During training, the parameter is set to true, and during evaluation, theparameter is set to false.
Outputs:Tensor, the normalized, scaled, offset tensor, of shape :math:'(N, C_{out}, H_{out}, W_{out})'.
"""
gamma_init = Tensor(np.array(np.ones(out_chls)).astype(np.float32))
beta_init = Tensor(np.array(np.ones(out_chls) * 0).astype(np.float32))
moving_mean_init = Tensor(np.array(np.ones(out_chls) * 0).astype(np.float32))
moving_var_init = Tensor(np.array(np.ones(out_chls)).astype(np.float32))return nn.BatchNorm2d(out_chls, momentum=momentum, affine=affine, gamma_init=gamma_init,beta_init=beta_init, moving_mean_init=moving_mean_init,moving_var_init=moving_var_init,use_batch_statistics=use_batch_statistics)

class ResNetFea(nn.Cell):
“”"
ResNet architecture.

Args:block (Cell): Block for network.layer_nums (list): Numbers of block in different layers.in_channels (list): Input channel in each layer.out_channels (list): Output channel in each layer.weights_update (bool): Weight update flag.Inputs:- **x** (Cell) - Input block.Outputs:Cell, output block.Support Plarforms:``Ascend`` ``CPU`` ``GPU``Examples:>>> ResNetFea(ResidualBlockUsing, [3, 4, 6, 3], [64, 256, 512, 1024], [256, 512, 1024, 2048], False)
"""
def __init__(self, block, layer_nums, in_channels, out_channels, weights_update=False):super(ResNetFea, self).__init__()if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:raise ValueError("the length of ""layer_num, inchannel, outchannel list must be 4!")bn_training = Falseself.conv1 = _conv(3, 64, kernel_size=7, stride=2, padding=3, pad_mode='pad')self.bn1 = _batch_norm2d_init(64, affine=bn_training, use_batch_statistics=bn_training)self.relu = P.ReLU()self.maxpool = P.MaxPool(kernel_size=3, strides=2, pad_mode="SAME")self.weights_update = weights_updateif not self.weights_update:self.conv1.weight.requires_grad = Falseself.layer1 = self._make_layer(block, layer_nums[0], in_channel=in_channels[0],out_channel=out_channels[0], stride=1, training=bn_training,weights_update=self.weights_update)self.layer2 = self._make_layer(block, layer_nums[1], in_channel=in_channels[1],out_channel=out_channels[1], stride=2,training=bn_training, weights_update=True)self.layer3 = self._make_layer(block, layer_nums[2], in_channel=in_channels[2],out_channel=out_channels[2], stride=2,training=bn_training, weights_update=True)self.layer4 = self._make_layer(block, layer_nums[3], in_channel=in_channels[3],out_channel=out_channels[3], stride=2,training=bn_training, weights_update=True)def _make_layer(self, block, layer_num, in_channel, out_channel, stride, training=False, weights_update=False):"""Make layer for resnet backbone.Args:block (Cell): ResNet block.layer_num (int): Layer number.in_channel (int): Input channel.out_channel (int): Output channel.stride (int): Stride size for convolutional layer.training(bool): Whether to do training. Default: False.weights_update(bool): Whether to update weights. Default: False.Returns:SequentialCell, Combine several layers toghter.Examples:>>> _make_layer(InvertedResidual, 4, 64, 64, 1)"""layers = []down_sample = Falseif stride != 1 or in_channel != out_channel:down_sample = Trueresblk = block(in_channel, out_channel, stride=stride, down_sample=down_sample,training=training, weights_update=weights_update)layers.append(resblk)for _ in range(1, layer_num):resblk = block(out_channel, out_channel, stride=1, training=training, weights_update=weights_update)layers.append(resblk)return nn.SequentialCell(layers)def construct(self, x):"""Construct ResNet architecture."""x = self.conv1(x)x = self.bn1(x)x = self.relu(x)c1 = self.maxpool(x)c2 = self.layer1(c1)identity = c2if not self.weights_update:identity = F.stop_gradient(c2)c3 = self.layer2(identity)c4 = self.layer3(c3)c5 = self.layer4(c4)return identity, c3, c4, c5

class ResidualBlockUsing(nn.Cell):
“”"
ResNet V1 residual block definition.

Args:in_channels (int): Input channel.out_channels (int): Output channel.stride (int): Stride size for the initial convolutional layer. Default: 1.down_sample (bool): If to do the downsample in block. Default: False.momentum (float): Momentum for batchnorm layer. Default: 0.1.training (bool): Training flag. Default: False.weights_updata (bool): Weights update flag. Default: False.Inputs:- **x** (Cell) - Input block.Outputs:Cell, output block.Support Plarforms:``Ascend`` ``CPU`` ``GPU``Examples:ResidualBlockUsing(3, 256, stride=2, down_sample=True)
"""
expansion = 4def __init__(self, in_channels, out_channels, stride=1, down_sample=False,momentum=0.1, training=False, weights_update=False):super(ResidualBlockUsing, self).__init__()self.affine = weights_updateout_chls = out_channels // self.expansionself.conv1 = _conv(in_channels, out_chls, kernel_size=1, stride=1, padding=0)self.bn1 = _batch_norm2d_init(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)self.conv2 = _conv(out_chls, out_chls, kernel_size=3, stride=stride, padding=1)self.bn2 = _batch_norm2d_init(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)self.conv3 = _conv(out_chls, out_channels, kernel_size=1, stride=1, padding=0)self.bn3 = _batch_norm2d_init(out_channels, momentum=momentum, affine=self.affine,use_batch_statistics=training)if training:self.bn1 = self.bn1.set_train()self.bn2 = self.bn2.set_train()self.bn3 = self.bn3.set_train()if not weights_update:self.conv1.weight.requires_grad = Falseself.conv2.weight.requires_grad = Falseself.conv3.weight.requires_grad = Falseself.relu = P.ReLU()self.downsample = down_sampleif self.downsample:self.conv_down_sample = _conv(in_channels, out_channels, kernel_size=1, stride=stride, padding=0)self.bn_down_sample = _batch_norm2d_init(out_channels, momentum=momentum, affine=self.affine,use_batch_statistics=training)if training:self.bn_down_sample = self.bn_down_sample.set_train()if not weights_update:self.conv_down_sample.weight.requires_grad = Falseself.add = P.Add()def construct(self, x):"""Construct ResNet V1 residual block."""identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample:identity = self.conv_down_sample(identity)identity = self.bn_down_sample(identity)out = self.add(out, identity)out = self.relu(out)return out

FPN网络
FPN网络(Feature Pyramid Network)同时利用低层特征高分辨率和高层特征的高语义信息,通过融合这些不同层的特征达到预测的效果。并且预测是在每个融合后的特征层上单独进行的,这和常规的特征融合方式不同。

骨干网络和FPN网络结合构成了Mask RCNN网络的卷积层。

def bias_init_zeros(shape):
“”“Bias init method.”“”
result = Tensor(np.array(np.zeros(shape).astype(np.float32)), dtype=mstype.float32)
return result

def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode=‘pad’):
“”"
Conv2D wrapper.

Args:in_channels(int): Input channel num.out_channels(int): Output channel num.kernel_size(int): Kernel size. Default: 1.stride(int): Stride. Default: 1.padding(int): Padding range. Default: 0.pad_mode(bool): Padding model. Default: 'pad'.gain(int): Gain. Default: 1.Returns:Tensor, Convoluted result.
"""
shape = (out_channels, in_channels, kernel_size, kernel_size)
weights = initializer("XavierUniform", shape=shape, dtype=mstype.float32)
shape_bias = (out_channels,)
biass = bias_init_zeros(shape_bias)
return nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding,pad_mode=pad_mode, weight_init=weights, has_bias=True, bias_init=biass)

class FeatPyramidNeck(nn.Cell):
“”"
Feature pyramid network cell, usually uses as network neck.

Applies the convolution on multiple, input feature maps
and output feature map with same channel size. if required num of
output larger then num of inputs, add extra maxpooling for further
downsampling;Args:in_channels (tuple): Channel size of input feature maps.out_channels (int): Channel size output.num_outs (int): Num of output features.
Inputs:- **x** (Tensor) - Input variantOutputs:Tuple, with tensors of same channel size.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> neck = FeatPyramidNeck([100,200,300], 50, 4)>>> input_data = (normal(0,0.1,(1,c,1280//(4*2**i), 768//(4*2**i)),...               dtype=np.float32) for i, c in enumerate(config.fpn_in_channels))>>> out = neck(input_data)
"""def __init__(self,in_channels,out_channels,num_outs):super(FeatPyramidNeck, self).__init__()self.cast_type = mstype.float32self.num_outs = num_outsself.in_channels = in_channelsself.fpn_layer = len(self.in_channels)assert not self.num_outs < len(in_channels)self.lateral_convs_list_ = []self.fpn_convs_ = []for _, channel in enumerate(in_channels):l_conv = _conv(channel, out_channels, kernel_size=1, stride=1, padding=0,pad_mode='valid').to_float(self.cast_type)fpn_conv = _conv(out_channels, out_channels, kernel_size=3, stride=1, padding=0,pad_mode='same').to_float(self.cast_type)self.lateral_convs_list_.append(l_conv)self.fpn_convs_.append(fpn_conv)self.lateral_convs_list = nn.layer.CellList(self.lateral_convs_list_)self.fpn_convs_list = nn.layer.CellList(self.fpn_convs_)self.interpolate1 = P.ResizeBilinear((48, 80))self.interpolate2 = P.ResizeBilinear((96, 160))self.interpolate3 = P.ResizeBilinear((192, 320))self.cast = P.Cast()self.maxpool = P.MaxPool(kernel_size=1, strides=2, pad_mode="same")def construct(self, inputs):"""construction of Feature Pyramid Neck."""layers = ()for i in range(self.fpn_layer):layers += (self.lateral_convs_list[i](inputs[i]),)cast_layers = (layers[3],)cast_layers = \cast_layers + (layers[2] + self.cast(self.interpolate1(cast_layers[self.fpn_layer - 4]), self.cast_type),)cast_layers = \cast_layers + (layers[1] + self.cast(self.interpolate2(cast_layers[self.fpn_layer - 3]), self.cast_type),)cast_layers = \cast_layers + (layers[0] + self.cast(self.interpolate3(cast_layers[self.fpn_layer - 2]), self.cast_type),)layers_arranged = ()for i in range(self.fpn_layer - 1, -1, -1):layers_arranged = layers_arranged + (cast_layers[i],)outs = ()for i in range(self.fpn_layer):outs = outs + (self.fpn_convs_list[i](layers_arranged[i]),)for i in range(self.num_outs - self.fpn_layer):outs = outs + (self.maxpool(outs[3]),)return outs

RPN网络
RPN第一次出现在世人眼中是在Faster RCNN这个结构中,专门用来提取候选框,在RCNN和Fast RCNN等物体检测架构中,用来提取候选框的方法通常是Selective Search,是比较传统的方法,而且比较耗时,在CPU上要2s一张图。所以作者提出RPN,专门用来提取候选框,一方面RPN耗时少,另一方面RPN可以很容易结合到Fast RCNN中,称为一个整体。

RPN网络主要输出项:

ROI:对应在特征层每个特征点产生4k个变量,其中4表示[dy, dx, dh, dw]四个边框平移缩放量。其中k表示4个边框,k=4。

scores:对应在特征层每个特征点产生2k个变量,其中2表示前景和北京概率。其中k表示3个边框,k=3。

from src.model.bbox_assign_sample import BboxAssignSample

class RpnRegClsBlock(nn.Cell):
“”"
Rpn reg cls block for rpn layer

Args:in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.weight_conv (Tensor): Weight init for rpn conv.bias_conv (Tensor): Bias init for rpn conv.weight_cls (Tensor): Weight init for rpn cls conv.bias_cls (Tensor): Bias init for rpn cls conv.weight_reg (Tensor): Weight init for rpn reg conv.bias_reg (Tensor): Bias init for rpn reg conv.Inputs:- **x** (Tensor) - input variantOutputs:Tensor, output tensor.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> x = Tensor(np.array([[[[1., 2.], [3., 4.]]]]), mindspore.float32)>>> weight_conv = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_conv = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> weight_cls = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_cls = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> weight_reg = Tensor(np.array([[[[0.2, 0.3], [0.4, 0.1]]]]), mindspore.float32)>>> bias_reg = Tensor(np.array([[[[0., 0.], [0., 0.]]]]), mindspore.float32)>>> rpn = RpnRegClsBlock(2, 2, 4, 4, )>>> rpn = ops.SingleRoIExtractor(2, 2, 0.5, 2, weight_conv, bias_conv,...                              weight_cls, bias_cls, weight_reg, bias_reg)>>> output = rpn(x)
"""
def __init__(self, in_channels, feat_channels, num_anchors, cls_out_channels, weight_conv,bias_conv, weight_cls, bias_cls, weight_reg, bias_reg):super(RpnRegClsBlock, self).__init__()self.rpn_conv = nn.Conv2d(in_channels, feat_channels, kernel_size=3,stride=1, pad_mode='same',has_bias=True, weight_init=weight_conv,bias_init=bias_conv)self.relu = nn.ReLU()self.rpn_cls = nn.Conv2d(feat_channels, num_anchors * cls_out_channels,kernel_size=1, pad_mode='valid',has_bias=True, weight_init=weight_cls,bias_init=bias_cls)self.rpn_reg = nn.Conv2d(feat_channels, num_anchors * 4,kernel_size=1, pad_mode='valid',has_bias=True, weight_init=weight_reg,bias_init=bias_reg)def construct(self, x):"""Construct Rpn reg cls block for rpn layer."""x = self.relu(self.rpn_conv(x))x1 = self.rpn_cls(x)x2 = self.rpn_reg(x)return x1, x2

class RPN(nn.Cell):
“”"
ROI proposal network…

Args:config (dict): Config.batch_size (int): Batchsize.in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.Inputs:- **inputs** (Tensor) - Input variant.- **img_metas** (Tensor) - Img shape.- **anchor_list** (Tensor) - A list of anchors.- **gt_bboxes** (Tensor) - Ground truth bounding boxes.- **gt_labels** (Tensor) - Ground truth labels.- **gt_valids** (Tensor) - Ground truth validations.Outputs:Tuple, tuple of output tensor.Support Platform:``Ascend`` ``CPU`` ``GPU``Examples:>>> RPN(config=config, batch_size=2, in_channels=256, feat_channels=1024,...     num_anchors=3, cls_out_channels=512)
"""
def __init__(self, config, batch_size, in_channels, feat_channels, num_anchors, cls_out_channels):super(RPN, self).__init__()cfg_rpn = configself.cast_type = mstype.float32self.np_cast_type = np.float32self.num_bboxes = cfg_rpn.num_bboxesself.slice_index = ()self.feature_anchor_shape = ()self.slice_index += (0,)index = 0for shape in cfg_rpn.feature_shapes:self.slice_index += (self.slice_index[index] + shape[0] * shape[1] * num_anchors,)self.feature_anchor_shape += (shape[0] * shape[1] * num_anchors * batch_size,)index += 1self.num_anchors = num_anchorsself.batch_size = batch_sizeself.test_batch_size = cfg_rpn.test_batch_sizeself.num_layers = 5self.real_ratio = Tensor(np.ones((1, 1)).astype(self.np_cast_type))self.rpn_convs_list = nn.layer.CellList(self._make_rpn_layer(self.num_layers, in_channels, feat_channels,num_anchors, cls_out_channels))self.transpose = P.Transpose()self.reshape = P.Reshape()self.concat = P.Concat(axis=0)self.fill = P.Fill()self.placeh1 = Tensor(np.ones((1,)).astype(self.np_cast_type))self.trans_shape = (0, 2, 3, 1)self.reshape_shape_reg = (-1, 4)self.reshape_shape_cls = (-1,)self.rpn_loss_reg_weight = Tensor(np.array(cfg_rpn.rpn_loss_reg_weight).astype(self.np_cast_type))self.rpn_loss_cls_weight = Tensor(np.array(cfg_rpn.rpn_loss_cls_weight).astype(self.np_cast_type))expected_total_size = cfg_rpn.num_expected_neg * self.batch_sizeself.num_expected_total = Tensor(np.array(expected_total_size).astype(self.np_cast_type))self.num_bboxes = cfg_rpn.num_bboxesself.get_targets = BboxAssignSample(cfg_rpn, self.batch_size, self.num_bboxes, False)self.check_valid = P.CheckValid()self.sum_loss = P.ReduceSum()self.loss_cls = P.SigmoidCrossEntropyWithLogits()self.loss_bbox = P.SmoothL1Loss(beta=1.0/9.0)self.squeeze = P.Squeeze()self.cast = P.Cast()self.tile = P.Tile()self.zeros_like = P.ZerosLike()self.loss = Tensor(np.zeros((1,)).astype(self.np_cast_type))self.clsloss = Tensor(np.zeros((1,)).astype(self.np_cast_type))self.regloss = Tensor(np.zeros((1,)).astype(self.np_cast_type))def _make_rpn_layer(self, num_layers, in_channels,feat_channels, num_anchors, cls_out_channels):"""Make rpn layer for rpn proposal networkArgs:num_layers (int): layer num.in_channels (int): Input channels of shared convolution.feat_channels (int): Output channels of shared convolution.num_anchors (int): The anchor number.cls_out_channels (int): Output channels of classification convolution.Returns:List, list of RpnRegClsBlock cells."""rpn_layer = []shp_weight_conv = (feat_channels, in_channels, 3, 3)shp_bias_conv = (feat_channels,)weight_conv = initializer('Normal', shape=shp_weight_conv, dtype=mstype.float32)bias_conv = initializer(0, shape=shp_bias_conv, dtype=mstype.float32)shp_weight_cls = (num_anchors * cls_out_channels, feat_channels, 1, 1)shp_bias_cls = (num_anchors * cls_out_channels,)weight_cls = initializer('Normal', shape=shp_weight_cls, dtype=mstype.float32)bias_cls = initializer(0, shape=shp_bias_cls, dtype=mstype.float32)shp_weight_reg = (num_anchors * 4, feat_channels, 1, 1)shp_bias_reg = (num_anchors * 4,)weight_reg = initializer('Normal', shape=shp_weight_reg, dtype=mstype.float32)bias_reg = initializer(0, shape=shp_bias_reg, dtype=mstype.float32)for i in range(num_layers):rpn_layer.append(RpnRegClsBlock(in_channels, feat_channels, num_anchors, cls_out_channels, weight_conv,bias_conv, weight_cls, bias_cls, weight_reg,bias_reg).to_float(self.cast_type))for i in range(1, num_layers):rpn_layer[i].rpn_conv.weight = rpn_layer[0].rpn_conv.weightrpn_layer[i].rpn_cls.weight = rpn_layer[0].rpn_cls.weightrpn_layer[i].rpn_reg.weight = rpn_layer[0].rpn_reg.weightrpn_layer[i].rpn_conv.bias = rpn_layer[0].rpn_conv.biasrpn_layer[i].rpn_cls.bias = rpn_layer[0].rpn_cls.biasrpn_layer[i].rpn_reg.bias = rpn_layer[0].rpn_reg.biasreturn rpn_layerdef construct(self, inputs, img_metas, anchor_list, gt_bboxes, gt_labels, gt_valids):"""Construct ROI Proposal Network."""loss_print = ()rpn_cls_score = ()rpn_bbox_pred = ()rpn_cls_score_total = ()rpn_bbox_pred_total = ()for i in range(self.num_layers):x1, x2 = self.rpn_convs_list[i](inputs[i])rpn_cls_score_total = rpn_cls_score_total + (x1,)rpn_bbox_pred_total = rpn_bbox_pred_total + (x2,)x1 = self.transpose(x1, self.trans_shape)x1 = self.reshape(x1, self.reshape_shape_cls)x2 = self.transpose(x2, self.trans_shape)x2 = self.reshape(x2, self.reshape_shape_reg)rpn_cls_score = rpn_cls_score + (x1,)rpn_bbox_pred = rpn_bbox_pred + (x2,)loss = self.lossclsloss = self.clslossregloss = self.reglossbbox_targets = ()bbox_weights = ()labels = ()label_weights = ()output = ()if self.training:for i in range(self.batch_size):multi_level_flags = ()anchor_list_tuple = ()for j in range(self.num_layers):res = self.cast(self.check_valid(anchor_list[j], self.squeeze(img_metas[i:i + 1:1, ::])),mstype.int32)multi_level_flags = multi_level_flags + (res,)anchor_list_tuple = anchor_list_tuple + (anchor_list[j],)valid_flag_list = self.concat(multi_level_flags)anchor_using_list = self.concat(anchor_list_tuple)gt_bboxes_i = self.squeeze(gt_bboxes[i:i + 1:1, ::])gt_labels_i = self.squeeze(gt_labels[i:i + 1:1, ::])gt_valids_i = self.squeeze(gt_valids[i:i + 1:1, ::])bbox_target, bbox_weight, label, label_weight = \self.get_targets(gt_bboxes_i, gt_labels_i, self.cast(valid_flag_list, mstype.bool_),anchor_using_list, gt_valids_i)bbox_weight = self.cast(bbox_weight, self.cast_type)label = self.cast(label, self.cast_type)label_weight = self.cast(label_weight, self.cast_type)for j in range(self.num_layers):begin = self.slice_index[j]end = self.slice_index[j + 1]stride = 1bbox_targets += (bbox_target[begin:end:stride, ::],)bbox_weights += (bbox_weight[begin:end:stride],)labels += (label[begin:end:stride],)label_weights += (label_weight[begin:end:stride],)for i in range(self.num_layers):bbox_target_using = ()bbox_weight_using = ()label_using = ()label_weight_using = ()for j in range(self.batch_size):bbox_target_using += (bbox_targets[i + (self.num_layers * j)],)bbox_weight_using += (bbox_weights[i + (self.num_layers * j)],)label_using += (labels[i + (self.num_layers * j)],)label_weight_using += (label_weights[i + (self.num_layers * j)],)bbox_target_with_batchsize = self.concat(bbox_target_using)bbox_weight_with_batchsize = self.concat(bbox_weight_using)label_with_batchsize = self.concat(label_using)label_weight_with_batchsize = self.concat(label_weight_using)# stopbbox_target_ = F.stop_gradient(bbox_target_with_batchsize)bbox_weight_ = F.stop_gradient(bbox_weight_with_batchsize)label_ = F.stop_gradient(label_with_batchsize)label_weight_ = F.stop_gradient(label_weight_with_batchsize)cls_score_i = rpn_cls_score[i]reg_score_i = rpn_bbox_pred[i]loss_cls = self.loss_cls(cls_score_i, label_)loss_cls_item = loss_cls * label_weight_loss_cls_item = self.sum_loss(loss_cls_item, (0,)) / self.num_expected_totalloss_reg = self.loss_bbox(reg_score_i, bbox_target_)bbox_weight_ = self.tile(self.reshape(bbox_weight_, (self.feature_anchor_shape[i], 1)), (1, 4))loss_reg = loss_reg * bbox_weight_loss_reg_item = self.sum_loss(loss_reg, (1,))loss_reg_item = self.sum_loss(loss_reg_item, (0,)) / self.num_expected_totalloss_total = self.rpn_loss_cls_weight * loss_cls_item + self.rpn_loss_reg_weight * loss_reg_itemloss += loss_totalloss_print += (loss_total, loss_cls_item, loss_reg_item)clsloss += loss_cls_itemregloss += loss_reg_itemoutput = (loss, rpn_cls_score_total, rpn_bbox_pred_total,clsloss, regloss, loss_print)else:output = (self.placeh1, rpn_cls_score_total, rpn_bbox_pred_total,self.placeh1, self.placeh1, self.placeh1)return output

ROI Align
ROI Align可以计算不同proposal对应到不同尺度下的特征,利用proposal对该特征进行剪裁、resize、pooling提取特征。

Mask-RCNN中使用的ROI Level校准:

解释
由于Mask R-CNN训练数据的box和anchor都做了调整,所以ROI Level的计算部分也需要
。其中,224应为输入图像尺寸的一半。

计算得到的k即为ROI对应的level,level一共4个:

表示映射回特征
,大小为原输入图像的

表示映射回特征
,大小为原输入图像的

表示映射回特征
,大小为原输入图像的

表示映射回特征
,大小为原输入图像的

image4

虚线网格表示特征图,实线表示RoI(在本例中为2×2个bin),点表示每个容器中的4个采样点。RoIAlign通过双线性插值从特征图上附近的网格点(最近的4个)计算每个采样点的值。在ROI、4个bin或采样点中涉及的任何坐标上都不进行量化。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/42743.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LRU 缓存机制

题目 运用你所掌握的数据结构&#xff0c;设计和实现一个 LRU (最近最少使用) 缓存机制 。 实现 LRUCache 类&#xff1a; LRUCache(int capacity) 以正整数作为容量 capacity 初始化 LRU 缓存 int get(int key) 如果关键字 key 存在于缓存中&#xff0c;则返回关键字的值&a…

leetcode判断二分图

判断二分图 图的问题肯定要用到深度优先遍历或者广度优先遍历&#xff0c;但又不是单纯的深度优先遍历算法和广度优先遍历算法&#xff0c;而是需要在遍历的过程中加入与解决题目相关的逻辑。 题干中说了&#xff0c;这个图可能不是连通图&#xff0c;这个提示有什么作用呢&a…

shared_ptr 线程安全

为什么 shared_ptr 可以安全地在多个线程中共享&#xff1f; 循环引用 因为shared_ptr std::shared_ptr 的引用计数是线程安全的。这意味着你可以在多个线程中安全地拷贝、赋值和销毁 std::shared_ptr。然而&#xff0c;访问或修改 shared_ptr 所指向的对象时&#xff0c;需要…

昇思25天学习打卡营第20天|LSTM+CRF序列标注

学AI还能赢奖品&#xff1f;每天30分钟&#xff0c;25天打通AI任督二脉 (qq.com) LSTMCRF序列标注 概述 序列标注指给定输入序列&#xff0c;给序列中每个Token进行标注标签的过程。序列标注问题通常用于从文本中进行信息抽取&#xff0c;包括分词(Word Segmentation)、词性标…

明日周刊-第15期

赶在周末结束前输出一把&#xff0c;周日的晚上大家要睡个好觉哦。 文章目录 一周热点资源分享言论歌曲推荐 一周热点 科技创新与基础设施建设 深中通道正式通车试运营 时间&#xff1a;6月30日 内容&#xff1a;国家重大工程深中通道正式通车试运营&#xff0c;标志着珠江口东…

深入理解Java可执行JAR文件

目录 引言JAR文件简介创建JAR文件 使用JDK的jar工具使用IDE创建JAR文件 指定Main-Class属性 在MANIFEST.MF文件中指定使用jar工具指定 运行可执行JAR文件在Maven项目中创建可执行JAR文件 配置pom.xml使用maven-jar-plugin 在Gradle项目中创建可执行JAR文件 配置build.gradle使…

MySQL中in和exists的区别

in和exists都是在 SQL 中用于检查子查询中是否存在数据的谓词&#xff0c;它们的区别主要体现在语法、用途、效率、错误处理以及子查询范围等方面&#xff0c;具体区别如下&#xff1a; 语法&#xff1a; exists&#xff1a;exists (子查询)in&#xff1a;列 in (子查询) 或 子…

Java实现布隆过滤器的几种方式

布隆过滤器应用场景: 为预防大量黑客故意发起非法的时间查询请求,造成缓存击穿,建议采用布隆过滤器的方法解决。布隆过滤器通过一个很长的二进制向量和一系列随机映射函数(哈希函数)来记录与识别某个数据是否在一个集合中。如果数据不在集合中,能被识别出来,不需要到数…

吉时利KEITHLEY KI-488驱动和说明

吉时利KEITHLEY KI-488驱动和说明

[吃瓜教程]南瓜书第6章支持向量机

0.补充知识 0.1 超平面 定义&#xff1a; 超平面是指在&#x1d45b;维空间中&#xff0c;维度为 &#x1d45b;−1的子空间。它是分割空间的一个平面。 性质&#xff1a; n维空间的超平面 ( w T x b 0 , 其中 w , x ∈ R n ) (w^Tx_b0,其中w,x\in \mathbb R^n) (wTxb​0,其…

【大模型】大语言模型:光鲜背后的阴影——事实准确性和推理能力的挑战

大语言模型&#xff1a;光鲜背后的阴影——事实准确性和推理能力的挑战 引言一、概念界定二、事实准确性的局限2.1 训练数据的偏差2.2 知识的时效性问题2.3 复杂概念的理解与表述 三、推理能力的局限3.1 表层理解与深层逻辑的脱节3.2 缺乏常识推理3.3 无法进行长期记忆和连续推…

通过端口转发实现docker容器运行时端口更改

通过端口转发实现docker容器运行时端口更改 前言启动容器查看容器ip地址端口转发 前言 关于修改docker正在运行中容器端口&#xff0c;网上大部分分为3类: 1. 删除原有容器重新创建;2. 改配置文件;3. 在现有容器上新提交镜像&#xff0c;用新镜像起新的容器。 1和3属于同一种流…

Spring Boot与Apache Kafka Streams的集成

Spring Boot与Apache Kafka Streams的集成 大家好&#xff0c;我是免费搭建查券返利机器人省钱赚佣金就用微赚淘客系统3.0的小编&#xff0c;也是冬天不穿秋裤&#xff0c;天冷也要风度的程序猿&#xff01; 一、Apache Kafka Streams简介 Apache Kafka Streams是一个用于构…

如何在Android中实现网络通信,如HttpURLConnection和HttpClient。

在Android开发中&#xff0c;网络通信是一个不可或缺的功能&#xff0c;它允许应用与服务器交换数据&#xff0c;实现丰富的功能。在实现网络通信时&#xff0c;HttpURLConnection和HttpClient是两种常用的方式。下面将从技术难点、面试官关注点、回答吸引力以及代码举例四个方…

【学习笔记】Redis学习笔记——第8章 对象

第8章 对象 8.1 对象的类型与编码 在Redis中存储对象时&#xff0c;键值对全部封装为RedisObject。 8.1.1 类型(type) 记录了对象的类型&#xff0c;Redis存储的Key为字符串对象&#xff0c;而Value可以是字符串对象、列表对象、哈希对象、集合对象、有序集合对象当中的一种…

UI还原度小技巧之缩放

还原度小技巧之缩放 背景缩放 背景 我们经常会遇到UI给的设计图尺寸较大&#xff0c;和我们浏览器相差太大&#xff0c;这时候&#xff0c;按照UI给的尺寸直接写进代码里面的话&#xff0c;可能会让页面结构在我们的浏览器上面显得很大&#xff0c;产生横向滚动条等&#xff0…

探讨4层代理和7层代理行为以及如何获取真实客户端IP

准备工作 实验环境 IP角色192.168.1.100客户端请求IP192.168.1.100python 启动的HTTP服务192.168.1.102nginx服务192.168.1.103haproxy 服务 HTTP服务 这是一个简单的HTTP服务&#xff0c;主要打印HTTP报文用于分析客户端IP #!/usr/bin/env python # coding: utf-8import …

「技术分享」FDL对接金蝶云API取数

很多企业的ERP系统都在用金蝶云星空&#xff0c;金蝶云星空API是IT人员获取数据的重要来源&#xff0c; 常常用来生成定制化报表&#xff0c;进行数据分析&#xff0c;或是将金蝶云的数据与OA系统、BI工具集成。 通常情况下&#xff0c;IT人员需要使用Python、Java等语言编写脚…

44、tomcat安装

一、tomcat tomcat和php一样&#xff0c;都是用来处理动态页面的。 tomcat也可以作为web应用服务器&#xff0c;开源的。 php .php tomcat .jsp nginx .html tomcat 是用Java代码写的程序&#xff0c;运行的是Java的web应用程序。 tomcat的特点和功能&#xff1a; 1、s…

XSS平台的搭建

第一步&#xff1a;安装MySQL 数据库 因为xss平台涉及到使用mysql 数据库&#xff0c;在安装之前&#xff0c;先使用docker 安装mysql 数据库。 docker run --name mysqlserver -e MYSQL_ROOT_PASSWORD123 -d -i -p 3309:3306 mysql:5.6 第二步&#xff1a;安装xssplatform…