DB算法原理与构建

参考:
https://aistudio.baidu.com/projectdetail/4483048

Real-Time Scene Text Detection with Differentiable Binarization

如何读论文-by 李沐

DB (Real-Time Scene Text Detection with Differentiable Binarization)

原理

DB是一个基于分割的文本检测算法,其提出的可微分阈值,采用动态的阈值区分文本区域与背景
在这里插入图片描述
基于分割的普通文本检测算法,流程如上图蓝色箭头所示,得到分割结果后采用固定的阈值(标准二值化不可微,导致网络无法端到端训练)得到二值化的分割图,之后采用诸如像素聚类的启发式算法得到文本区域。

DB算法的流程如图中红色箭头所示,最大的不同在于DB有一个阈值图,通过网络去预测图片每个位置处的阈值,而不是采用一个固定的值,更好的分离文本背景与前景。

优势:
1.算法结构简单,无需繁琐的后处理
2.开源数据上拥有良好的精度和性能

DB算法提出了可微二值化,可微二值化将标准二值化中的阶跃函数进行了近似,使用如下公式进行代替:

在这里插入图片描述
在这里插入图片描述
DB算法整体结构:
在这里插入图片描述
输入的图像经过网络Backbone和FPN提取特征,提取后的特征级联在一起,得到原图四分之一大小的特征,然后利用卷积层分别得到文本区域预测概率图和阈值图,进而通过DB的后处理得到文本包围曲线。

DB文本检测模型构建

DB文本检测模型可以分为三个部分:

Backbone网络,负责提取图像的特征
FPN网络,特征金字塔结构增强特征
Head网络,计算文本区域概率图

backbone网络:论文中使用了ResNet50,本节实验中,为了加快训练速度,采用MobileNetV3 large结构作为backbone。

DB的Backbone用于提取图像的多尺度特征,如下代码所示,假设输入的形状为[640, 640],backbone网络的输出有四个特征,其形状分别是 [1, 16, 160, 160],[1, 24, 80, 80], [1, 56, 40, 40],[1, 480, 20, 20]。 这些特征将输入给特征金字塔FPN网络进一步的增强特征。

import paddle 
from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")# 1. 声明Backbone
model_backbone = MobileNetV3()
model_backbone.eval()# 2. 执行预测
outs = model_backbone(fake_inputs)# 3. 打印网络结构
# print(model_backbone)# 4. 打印输出特征形状
for idx, out in enumerate(outs):print("The index is ", idx, "and the shape of output is ", out.shape)

FPN网络

特征金字塔结构FPN是一种卷积网络来高效提取图片中各维度特征的常用方法。
FPN网络的输入为Backbone部分的输出,输出特征图的高度和宽度为原图的四分之一。假设输入图像的形状为[1, 3, 640, 640],FPN输出特征的高度和宽度为[160, 160]

 import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttrclass DBFPN(nn.Layer):def __init__(self, in_channels, out_channels, **kwargs):super(DBFPN, self).__init__()self.out_channels = out_channels# DBFPN详细实现参考: https://github.com/PaddlePaddle/PaddleOCRblob/release%2F2.4/ppocr/modeling/necks/db_fpn.pydef forward(self, x):c2, c3, c4, c5 = xin5 = self.in5_conv(c5)in4 = self.in4_conv(c4)in3 = self.in3_conv(c3)in2 = self.in2_conv(c2)# 特征上采样out4 = in4 + F.upsample(in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16out3 = in3 + F.upsample(out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8out2 = in2 + F.upsample(out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4p5 = self.p5_conv(in5)p4 = self.p4_conv(out4)p3 = self.p3_conv(out3)p2 = self.p2_conv(out2)# 特征上采样p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)fuse = paddle.concat([p5, p4, p3, p2], axis=1)return fuse

Head网络

计算文本区域概率图,文本区域阈值图以及文本区域二值图。
DB Head网络会在FPN特征的基础上作上采样,将FPN特征由原图的四分之一大小映射到原图大小。


import math
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttrclass DBHead(nn.Layer):"""Differentiable Binarization (DB) for text detection:see https://arxiv.org/abs/1911.08947args:params(dict): super parameters for build DB network"""def __init__(self, in_channels, k=50, **kwargs):super(DBHead, self).__init__()self.k = k# DBHead详细实现参考 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppocr/modeling/heads/det_db_head.pydef step_function(self, x, y):# 可微二值化实现,通过概率图和阈值图计算文本分割二值图return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))def forward(self, x, targets=None):shrink_maps = self.binarize(x)if not self.training:return {'maps': shrink_maps}threshold_maps = self.thresh(x)binary_maps = self.step_function(shrink_maps, threshold_maps)y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)return {'maps': y}
# 1. 从PaddleOCR中imort DBHead
from ppocr.modeling.heads.det_db_head import DBHead
import paddle # 2. 计算DBFPN网络输出结果
fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")
model_backbone = MobileNetV3()
in_channles = model_backbone.out_channels
model_fpn = DBFPN(in_channels=in_channles, out_channels=256)
outs = model_backbone(fake_inputs)
fpn_outs = model_fpn(outs)# 3. 声明Head网络
model_db_head = DBHead(in_channels=256)# 4. 打印DBhead网络
print(model_db_head)# 5. 计算Head网络的输出
db_head_outs = model_db_head(fpn_outs)
print(f"The shape of fpn outs {fpn_outs.shape}")
print(f"The shape of DB head outs {db_head_outs['maps'].shape}")

在这里插入图片描述

运行后发现报错:
类不完整,于是重新到github paddle ocr目录下下载相应文件
db_fpn.py
det_db_head.py

完整代码:

# from paddle import nn
# 
# import paddle
# from paddle import nn
# import paddle.nn.functional as F
# from paddle import ParamAttr
# 
# import math
# import paddle
# from paddle import nn
# import paddle.nn.functional as F
# from paddle import ParamAttr
# 
# # import paddle
# # from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3import math
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttrdef make_divisible(v, divisor=8, min_value=None):if min_value is None:min_value = divisornew_v = max(min_value, int(v + divisor / 2) // divisor * divisor)if new_v < 0.9 * v:new_v += divisorreturn new_vclass MobileNetV3(nn.Layer):def __init__(self,in_channels=3,model_name='large',scale=0.5,disable_se=False,**kwargs):"""the MobilenetV3 backbone network for detection module.Args:params(dict): the super parameters for build network"""super(MobileNetV3, self).__init__()self.disable_se = disable_seif model_name == "large":cfg = [# k, exp, c,  se,     nl,  s,[3, 16, 16, False, 'relu', 1],[3, 64, 24, False, 'relu', 2],[3, 72, 24, False, 'relu', 1],[5, 72, 40, True, 'relu', 2],[5, 120, 40, True, 'relu', 1],[5, 120, 40, True, 'relu', 1],[3, 240, 80, False, 'hardswish', 2],[3, 200, 80, False, 'hardswish', 1],[3, 184, 80, False, 'hardswish', 1],[3, 184, 80, False, 'hardswish', 1],[3, 480, 112, True, 'hardswish', 1],[3, 672, 112, True, 'hardswish', 1],[5, 672, 160, True, 'hardswish', 2],[5, 960, 160, True, 'hardswish', 1],[5, 960, 160, True, 'hardswish', 1],]cls_ch_squeeze = 960elif model_name == "small":cfg = [# k, exp, c,  se,     nl,  s,[3, 16, 16, True, 'relu', 2],[3, 72, 24, False, 'relu', 2],[3, 88, 24, False, 'relu', 1],[5, 96, 40, True, 'hardswish', 2],[5, 240, 40, True, 'hardswish', 1],[5, 240, 40, True, 'hardswish', 1],[5, 120, 48, True, 'hardswish', 1],[5, 144, 48, True, 'hardswish', 1],[5, 288, 96, True, 'hardswish', 2],[5, 576, 96, True, 'hardswish', 1],[5, 576, 96, True, 'hardswish', 1],]cls_ch_squeeze = 576else:raise NotImplementedError("mode[" + model_name +"_model] is not implemented!")supported_scale = [0.35, 0.5, 0.75, 1.0, 1.25]assert scale in supported_scale, \"supported scale are {} but input scale is {}".format(supported_scale, scale)inplanes = 16# conv1self.conv = ConvBNLayer(in_channels=in_channels,out_channels=make_divisible(inplanes * scale),kernel_size=3,stride=2,padding=1,groups=1,if_act=True,act='hardswish')self.stages = []self.out_channels = []block_list = []i = 0inplanes = make_divisible(inplanes * scale)for (k, exp, c, se, nl, s) in cfg:se = se and not self.disable_sestart_idx = 2 if model_name == 'large' else 0if s == 2 and i > start_idx:self.out_channels.append(inplanes)self.stages.append(nn.Sequential(*block_list))block_list = []block_list.append(ResidualUnit(in_channels=inplanes,mid_channels=make_divisible(scale * exp),out_channels=make_divisible(scale * c),kernel_size=k,stride=s,use_se=se,act=nl))inplanes = make_divisible(scale * c)i += 1block_list.append(ConvBNLayer(in_channels=inplanes,out_channels=make_divisible(scale * cls_ch_squeeze),kernel_size=1,stride=1,padding=0,groups=1,if_act=True,act='hardswish'))self.stages.append(nn.Sequential(*block_list))self.out_channels.append(make_divisible(scale * cls_ch_squeeze))for i, stage in enumerate(self.stages):self.add_sublayer(sublayer=stage, name="stage{}".format(i))def forward(self, x):x = self.conv(x)out_list = []for stage in self.stages:x = stage(x)out_list.append(x)return out_listclass ConvBNLayer(nn.Layer):def __init__(self,in_channels,out_channels,kernel_size,stride,padding,groups=1,if_act=True,act=None):super(ConvBNLayer, self).__init__()self.if_act = if_actself.act = actself.conv = nn.Conv2D(in_channels=in_channels,out_channels=out_channels,kernel_size=kernel_size,stride=stride,padding=padding,groups=groups,bias_attr=False)self.bn = nn.BatchNorm(num_channels=out_channels, act=None)def forward(self, x):x = self.conv(x)x = self.bn(x)if self.if_act:if self.act == "relu":x = F.relu(x)elif self.act == "hardswish":x = F.hardswish(x)else:print("The activation function({}) is selected incorrectly.".format(self.act))exit()return xclass ResidualUnit(nn.Layer):def __init__(self,in_channels,mid_channels,out_channels,kernel_size,stride,use_se,act=None):super(ResidualUnit, self).__init__()self.if_shortcut = stride == 1 and in_channels == out_channelsself.if_se = use_seself.expand_conv = ConvBNLayer(in_channels=in_channels,out_channels=mid_channels,kernel_size=1,stride=1,padding=0,if_act=True,act=act)self.bottleneck_conv = ConvBNLayer(in_channels=mid_channels,out_channels=mid_channels,kernel_size=kernel_size,stride=stride,padding=int((kernel_size - 1) // 2),groups=mid_channels,if_act=True,act=act)if self.if_se:self.mid_se = SEModule(mid_channels)self.linear_conv = ConvBNLayer(in_channels=mid_channels,out_channels=out_channels,kernel_size=1,stride=1,padding=0,if_act=False,act=None)def forward(self, inputs):x = self.expand_conv(inputs)x = self.bottleneck_conv(x)if self.if_se:x = self.mid_se(x)x = self.linear_conv(x)if self.if_shortcut:x = paddle.add(inputs, x)return xclass SEModule(nn.Layer):def __init__(self, in_channels, reduction=4):super(SEModule, self).__init__()self.avg_pool = nn.AdaptiveAvgPool2D(1)self.conv1 = nn.Conv2D(in_channels=in_channels,out_channels=in_channels // reduction,kernel_size=1,stride=1,padding=0)self.conv2 = nn.Conv2D(in_channels=in_channels // reduction,out_channels=in_channels,kernel_size=1,stride=1,padding=0)def forward(self, inputs):outputs = self.avg_pool(inputs)outputs = self.conv1(outputs)outputs = F.relu(outputs)outputs = self.conv2(outputs)outputs = F.hardsigmoid(outputs, slope=0.2, offset=0.5)return inputs * outputsclass DBFPN(nn.Layer):def __init__(self, in_channels, out_channels, **kwargs):super(DBFPN, self).__init__()self.out_channels = out_channelsweight_attr = paddle.nn.initializer.KaimingUniform()self.in2_conv = nn.Conv2D(in_channels=in_channels[0],out_channels=self.out_channels,kernel_size=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)self.in3_conv = nn.Conv2D(in_channels=in_channels[1],out_channels=self.out_channels,kernel_size=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)self.in4_conv = nn.Conv2D(in_channels=in_channels[2],out_channels=self.out_channels,kernel_size=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)self.in5_conv = nn.Conv2D(in_channels=in_channels[3],out_channels=self.out_channels,kernel_size=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)self.p5_conv = nn.Conv2D(in_channels=self.out_channels,out_channels=self.out_channels // 4,kernel_size=3,padding=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)self.p4_conv = nn.Conv2D(in_channels=self.out_channels,out_channels=self.out_channels // 4,kernel_size=3,padding=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)self.p3_conv = nn.Conv2D(in_channels=self.out_channels,out_channels=self.out_channels // 4,kernel_size=3,padding=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)self.p2_conv = nn.Conv2D(in_channels=self.out_channels,out_channels=self.out_channels // 4,kernel_size=3,padding=1,weight_attr=ParamAttr(initializer=weight_attr),bias_attr=False)def forward(self, x):c2, c3, c4, c5 = xin5 = self.in5_conv(c5)in4 = self.in4_conv(c4)in3 = self.in3_conv(c3)in2 = self.in2_conv(c2)out4 = in4 + F.upsample(in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16out3 = in3 + F.upsample(out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8out2 = in2 + F.upsample(out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4p5 = self.p5_conv(in5)p4 = self.p4_conv(out4)p3 = self.p3_conv(out3)p2 = self.p2_conv(out2)p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)fuse = paddle.concat([p5, p4, p3, p2], axis=1)return fuse
# class DBFPN(nn.Layer):
#     def __init__(self, in_channels, out_channels, **kwargs):
#         super(DBFPN, self).__init__()
#         self.out_channels = out_channels
#
#         # DBFPN详细实现参考: https://github.com/PaddlePaddle/PaddleOCRblob/release%2F2.4/ppocr/modeling/necks/db_fpn.py
#
#     def forward(self, x):
#         c2, c3, c4, c5 = x
#
#         in5 = self.in5_conv(c5)
#         in4 = self.in4_conv(c4)
#         in3 = self.in3_conv(c3)
#         in2 = self.in2_conv(c2)
#
#         # 特征上采样
#         out4 = in4 + F.upsample(
#             in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16
#         out3 = in3 + F.upsample(
#             out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8
#         out2 = in2 + F.upsample(
#             out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4
#
#         p5 = self.p5_conv(in5)
#         p4 = self.p4_conv(out4)
#         p3 = self.p3_conv(out3)
#         p2 = self.p2_conv(out2)
#
#         # 特征上采样
#         p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
#         p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
#         p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)
#
#         fuse = paddle.concat([p5, p4, p3, p2], axis=1)
#         return fusedef get_bias_attr(k):stdv = 1.0 / math.sqrt(k * 1.0)initializer = paddle.nn.initializer.Uniform(-stdv, stdv)bias_attr = ParamAttr(initializer=initializer)return bias_attrclass Head(nn.Layer):def __init__(self, in_channels, name_list):super(Head, self).__init__()self.conv1 = nn.Conv2D(in_channels=in_channels,out_channels=in_channels // 4,kernel_size=3,padding=1,weight_attr=ParamAttr(),bias_attr=False)self.conv_bn1 = nn.BatchNorm(num_channels=in_channels // 4,param_attr=ParamAttr(initializer=paddle.nn.initializer.Constant(value=1.0)),bias_attr=ParamAttr(initializer=paddle.nn.initializer.Constant(value=1e-4)),act='relu')self.conv2 = nn.Conv2DTranspose(in_channels=in_channels // 4,out_channels=in_channels // 4,kernel_size=2,stride=2,weight_attr=ParamAttr(initializer=paddle.nn.initializer.KaimingUniform()),bias_attr=get_bias_attr(in_channels // 4))self.conv_bn2 = nn.BatchNorm(num_channels=in_channels // 4,param_attr=ParamAttr(initializer=paddle.nn.initializer.Constant(value=1.0)),bias_attr=ParamAttr(initializer=paddle.nn.initializer.Constant(value=1e-4)),act="relu")self.conv3 = nn.Conv2DTranspose(in_channels=in_channels // 4,out_channels=1,kernel_size=2,stride=2,weight_attr=ParamAttr(initializer=paddle.nn.initializer.KaimingUniform()),bias_attr=get_bias_attr(in_channels // 4), )def forward(self, x):x = self.conv1(x)x = self.conv_bn1(x)x = self.conv2(x)x = self.conv_bn2(x)x = self.conv3(x)x = F.sigmoid(x)return xclass DBHead(nn.Layer):"""Differentiable Binarization (DB) for text detection:see https://arxiv.org/abs/1911.08947args:params(dict): super parameters for build DB network"""def __init__(self, in_channels, k=50, **kwargs):super(DBHead, self).__init__()self.k = kbinarize_name_list = ['conv2d_56', 'batch_norm_47', 'conv2d_transpose_0', 'batch_norm_48','conv2d_transpose_1', 'binarize']thresh_name_list = ['conv2d_57', 'batch_norm_49', 'conv2d_transpose_2', 'batch_norm_50','conv2d_transpose_3', 'thresh']self.binarize = Head(in_channels, binarize_name_list)self.thresh = Head(in_channels, thresh_name_list)def step_function(self, x, y):return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))def forward(self, x, targets=None):shrink_maps = self.binarize(x)if not self.training:return {'maps': shrink_maps}threshold_maps = self.thresh(x)binary_maps = self.step_function(shrink_maps, threshold_maps)y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)return {'maps': y}
# class DBHead(nn.Layer):
#     """
#     Differentiable Binarization (DB) for text detection:
#         see https://arxiv.org/abs/1911.08947
#     args:
#         params(dict): super parameters for build DB network
#     """
#
#     def __init__(self, in_channels, k=50, **kwargs):
#         super(DBHead, self).__init__()
#         self.k = k
#
#         # DBHead详细实现参考 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppocr/modeling/heads/det_db_head.py
#
#     def step_function(self, x, y):
#         # 可微二值化实现,通过概率图和阈值图计算文本分割二值图
#         return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))
#
#     def forward(self, x, targets=None):
#         shrink_maps = self.binarize(x)
#         if not self.training:
#             return {'maps': shrink_maps}
#
#         threshold_maps = self.thresh(x)
#         binary_maps = self.step_function(shrink_maps, threshold_maps)
#         y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
#         return {'maps': y}if __name__=='__main__':fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")#   声明Backbonemodel_backbone = MobileNetV3()# model_backbone.eval()# # 2. 执行预测# outs = model_backbone(fake_inputs)# # 3. 打印网络结构# # print(model_backbone)## # 4. 打印输出特征形状# for idx, out in enumerate(outs):#     print("The index is ", idx, "and the shape of output is ", out.shape)# The index is  0 and the shape of output is  [1, 16, 160, 160]# The index is  1 and the shape of output is  [1, 24, 80, 80]# The index is  2 and the shape of output is  [1, 56, 40, 40]# The index is  3 and the shape of output is  [1, 480, 20, 20]in_channles = model_backbone.out_channels# 声明FPN网络model_fpn = DBFPN(in_channels=in_channles, out_channels=256)#  打印FPN网络print(model_fpn)# DBFPN(#   (in2_conv): Conv2D(16, 256, kernel_size=[1, 1], data_format=NCHW)#   (in3_conv): Conv2D(24, 256, kernel_size=[1, 1], data_format=NCHW)#   (in4_conv): Conv2D(56, 256, kernel_size=[1, 1], data_format=NCHW)#   (in5_conv): Conv2D(480, 256, kernel_size=[1, 1], data_format=NCHW)#   (p5_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)#   (p4_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)#   (p3_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)#   (p2_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)# )# 5. 计算得到FPN结果输出outs = model_backbone(fake_inputs)fpn_outs = model_fpn(outs)# The shape of fpn outs [1, 256, 160, 160]# 3. 声明Head网络model_db_head = DBHead(in_channels=256)# 4. 打印DBhead网络print(model_db_head)# DBHead(#   (binarize): Head(#     (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)#     (conv_bn1): BatchNorm()#     (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)#     (conv_bn2): BatchNorm()#     (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)#   )#   (thresh): Head(#     (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)#     (conv_bn1): BatchNorm()#     (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)#     (conv_bn2): BatchNorm()#     (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)#   )# )# 5. 计算Head网络的输出db_head_outs = model_db_head(fpn_outs)print(f"The shape of fpn outs {fpn_outs.shape}")# The shape of fpn outs [1, 256, 160, 160]print(f"The shape of DB head outs {db_head_outs['maps'].shape}")# The shape of DB head outs [1, 3, 640, 640]

结果:

DBFPN((in2_conv): Conv2D(16, 256, kernel_size=[1, 1], data_format=NCHW)(in3_conv): Conv2D(24, 256, kernel_size=[1, 1], data_format=NCHW)(in4_conv): Conv2D(56, 256, kernel_size=[1, 1], data_format=NCHW)(in5_conv): Conv2D(480, 256, kernel_size=[1, 1], data_format=NCHW)(p5_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)(p4_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)(p3_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)(p2_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
)
DBHead((binarize): Head((conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)(conv_bn1): BatchNorm()(conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)(conv_bn2): BatchNorm()(conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW))(thresh): Head((conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)(conv_bn1): BatchNorm()(conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)(conv_bn2): BatchNorm()(conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW))
)
The shape of fpn outs [1, 256, 160, 160]
The shape of DB head outs [1, 3, 640, 640]

DB算法优点:(有监督,backbone选ResNet50效果更好)

  • 精度更高、快
  • 弯曲文本
  • 多方向文本
  • 多语言

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/744606.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

区块链基础知识(上):区块链基本原理、加密哈希、公钥加密

目录 基本原理 加密哈希&#xff1a; 公钥加密&#xff1a; 希望有人向你发送只有你才能打开的加密文档/消息时使用 PKC 希望向其他人发送加密文档/消息并证明它确实由你发送时使用 PKC 使用 PKC 和加密哈希对文档/消息进行数字签名 交易哈希链使用数字签名转让数字资产所…

SenseNova 商汤日日新大模型 Function Call(函数调用)功能讲解和应用示例

考虑到使用 magic 申请 OpenAPI 的账号挺麻烦的&#xff0c;这里以商汤日日新大模型 SenseNova 介绍 Function Call 的功能。 官方链接&#xff1a;日日新开放平台 一、Function Call 是个啥&#xff1f; 在 LLM&#xff08;Large Language Model&#xff09; 语言大模型时代&…

YOLOv9实例分割教程|(二)验证教程

专栏地址&#xff1a;目前售价售价59.9&#xff0c;改进点30个 专栏介绍&#xff1a;YOLOv9改进系列 | 包含深度学习最新创新&#xff0c;助力高效涨点&#xff01;&#xff01;&#xff01; 一、验证 打开分割验证文件&#xff0c;填入数据集配置文件、训练好的权重文件&…

报告合集 |2023年,5份必读的“数字孪生”行业报告合集(文末下载)

数字孪生正在快速改变多个行业的面貌。它通过创建物理世界对象的虚拟复制&#xff0c;使得数据分析和系统优化能够在数字空间中实现&#xff0c;正在制造业、城市规划、医疗保健等国家支柱行业展现出巨大的变革力量&#xff0c;为行业的智能决策和预测提供了强大的支撑。 作为…

【UE】AI行为树入门——以小白人跟踪玩家并攻击为例

目录 前言 效果 步骤 一、准备工作 二、用蓝图实现AI随机移动 三、用行为树实现AI随机移动与跟踪玩家并攻击的效果 3.1 AI随机移动 3.2 AI看到玩家后跟踪玩家 3.3 AI攻击玩家 前言 本篇文章要实现的效果是&#xff1a;小白人随机移动&#xff0c;并且在移动过程中如…

电玩城游戏大厅计时软件怎么用,佳易王计时计费管理系统软件定时语音提醒操作教程

电玩城游戏大厅计时软件怎么用&#xff0c;佳易王计时计费管理系统软件定时语音提醒操作教程 一、前言 以下软件操作教程以 佳易王电玩计时计费软件V18.0为例 说明 软件文件下载可以点击最下方官网卡片——软件下载——试用版软件下载 1、软件计时计费&#xff0c;只需点击开…

国际前十正规外汇实时行情走势app软件最新排名(综合版)

外汇交易&#xff0c;作为当今世界金融市场上一个重要的板块&#xff0c;备受关注和热议。随着金融市场的日益发展&#xff0c;外汇交易也发展成为一个新兴的投资交易渠道。为了更好地满足投资者对外汇市场的需求&#xff0c;外汇实时行情走势app软件应运而生&#xff0c;它为投…

Material UI 5 学习03-Text Field文本输入框

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 Text Field文本输入框 一、最基本的本文输入框1、基础示例2、一些表单属性3、验证 二、多行文本 一、最基本的本文输入框 1、基础示例 import {Box, TextField} from "…

【Python】新手入门学习:详细介绍里氏替换原则(LSP)及其作用、代码示例

【Python】新手入门学习&#xff1a;详细介绍里氏替换原则&#xff08;LSP&#xff09;及其作用、代码示例 &#x1f308; 个人主页&#xff1a;高斯小哥 &#x1f525; 高质量专栏&#xff1a;Matplotlib之旅&#xff1a;零基础精通数据可视化、Python基础【高质量合集】、PyT…

前端基础篇-深入了解 JavaScript(一)

文章目录 1.0 JavaScript 概述 2.0 JS - 引入方式 3.0 JS - 基础语法 4.0 JS - 数据类型 5.0 JS - 函数 6.0 JS - Array 数组 7.0 JS - String 字符串 1.0 JavaScript 概述 JavaScript(简称&#xff1a;JS)是一门夸平台、面向对象的脚本语言。使用来控制网页行为&#xff0c;它…

C++训练营:引用传递

大家好&#xff1a; 衷心希望各位点赞。 您的问题请留在评论区&#xff0c;我会及时回答。 一、引用传递 简单来说&#xff0c;“引用”就是给已有的变量起一个别名。引用并没有自己单独的内存空间&#xff0c;作为引用&#xff0c;它和原变量共用一段内存空间。引用的定义格…

算法空间复杂度计算

目录 空间复杂度定义 影响空间复杂度的因素 算法在运行过程中临时占用的存储空间讲解 例子 斐波那契数列递归算法的性能分析 二分法&#xff08;递归实现&#xff09;的性能分析 空间复杂度定义 空间复杂度(Space Complexity)是对一个算法在运行过程中临时占用存储空间大…

Springboot中Redis的配置使用

新建 向pom.xml中添加依赖&#xff0c;这个可以不用标注版本号 <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-redis</artifactId></dependency> 配置yml文件&#xff08;文件名不可以错…

钉钉与实在智能达成战略合作,实在Agent助力钉钉AI助理成为“新质生产力”

3月12日&#xff0c;浙江实在智能科技有限公司&#xff08;简称“实在智能”&#xff09;与钉钉&#xff08;中国&#xff09;信息技术有限公司&#xff08;简称“钉钉”&#xff09;签署战略合作协议&#xff0c;达成战略合作伙伴关系。 未来&#xff0c;基于双方创新领先的技…

echarts - 鼠标事件详解

一、echarts 事件概念 chart.on(eventName, query, handler);1. 鼠标事件类型 eventName ECharts 支持9种常规的鼠标事件类型&#xff0c;包括click、 dblclick、 mousedown、mousemove、mouseup、mouseover、 mouseout、 globalout、contextmenu事件。 click&#xff1a;鼠…

C语言【典型算法编程题】总结

以下最全总结! 一,分支结构 1,if 编写程序,从键盘上输入三角形的三个边长(实数),判断这三个边能否构成三角形(构成三角形的条件为:任意两边之和大于第三边),如果能构成三角形,则计算三角形的面积并输出(保留2位小数);如果不能构成三角形,则输出“Flase”字符…

OCR-free相关论文梳理

⚠️注意&#xff1a;暂未写完&#xff0c;持续更新中 引言 通用文档理解&#xff0c;是OCR任务的终极目标。现阶段的OCR各种垂类任务都是通用文档理解任务的子集。这感觉就像我们一下子做不到通用文档理解&#xff0c;退而求其次&#xff0c;先做各种垂类任务。 现阶段&…

App Inventor 2 Clipboard 拓展:实现剪贴板的复制粘贴功能

效果如下&#xff1a; 此 Clipboard 拓展由中文网开发及维护&#xff0c;最新版本 v1.0&#xff0c;基于 TaifunClipboard 开发。 使用方法 属性及方法很简单&#xff0c;默认操作成功后显示提示信息&#xff0c;SuppressToast设置为 假 后&#xff0c;则不显示提示信息。 经测…

工业界真实的推荐系统(小红书)-重排:多样性算法-MMR、DPP、结合业务规则

课程特点&#xff1a;系统、清晰、实用&#xff0c;原理和落地经验兼具 b站&#xff1a;https://www.bilibili.com/video/BV1HZ421U77y/?spm_id_from333.337.search-card.all.click&vd_sourceb60d8ab7e659b10ea6ea743ede0c5b48 讲义&#xff1a;https://github.com/wangsh…

掌握SWOT分析:深入了解企业战略利器

在当今充满挑战和机遇的商业世界中&#xff0c;SWOT分析成为了企业战略制定和执行的不可或缺的工具。SWOT分析是一种系统性方法&#xff0c;用于评估企业内部的优势和劣势&#xff0c;以及外部环境中的机遇和威胁。本文将深入探讨SWOT分析的各个方面&#xff0c;揭示其深层次的…