SSD( Single Shot MultiBox Detector)关键源码解析

SSD(SSD: Single Shot MultiBox Detector)是采用单个深度神经网络模型实现目标检测和识别的方法。如图0-1所示,该方法是综合了Faster R-CNN的anchor box和YOLO单个神经网络检测思路(YOLOv2也采用了类似的思路,详见YOLO升级版:YOLOv2和YOLO9000解析),既有Faster R-CNN的准确率又有YOLO的检测速度,可以实现高准确率实时检测。在300*300分辨率,SSD在VOC2007数据集上准确率为74.3%mAP,59FPS;512*512分辨率,SSD获得了超过Fast R-CNN,获得了80%mAP/19fps的结果,如图0-2所示。SSD关键点分为两类:模型结构和训练方法。模型结构包括:多尺度特征图检测网络结构和anchor boxes生成;训练方法包括:ground truth预处理和损失函数。本文解析的是SSD的tensorflow实现源码,来源balancap/SSD-Tensorflow。本文结构如下:

1,多尺度特征图检测网络结构;

2,anchor boxes生成;

3,ground truth预处理;

4,目标函数;

5,总结

<img src="https://pic2.zhimg.com/v2-d0252b7d1408105470b88ceb45054725_b.png" data-rawwidth="1031" data-rawheight="686" class="origin_image zh-lightbox-thumb" width="1031" data-original="https://pic2.zhimg.com/v2-d0252b7d1408105470b88ceb45054725_r.png">

图0-1 SSD与MultiBox,Faster R-CNN,YOLO原理(此图来源于作者在eccv2016的PPT)

<img src="https://pic2.zhimg.com/v2-0213e22e8b0d96f8854e82d796c83a71_b.png" class="content_image">

图0-2 SSD检测速度与精确度。(此图来源于作者在eccv2016的PPT)


1 多尺度特征图检测网络结构

SSD的网络模型如图1-1所示。<img src="https://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_b.png" data-rawwidth="1152" data-rawheight="553" class="origin_image zh-lightbox-thumb" width="1152" data-original="https://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_r.png">

图1-1 SSD模型结构。(此图来源于原论文)

模型建立源代码包含于ssd_vgg_300.py中。模型多尺度特征图检测如图1-2所示。模型选择的特征图包括:38×38(block4),19×19(block7),10×10(block8),5×5(block9),3×3(block10),1×1(block11)。对于每张特征图,生成采用3×3卷积生成 默认框的四个偏移位置和21个类别的置信度。比如block7,默认框(def boxes)数目为6,每个默认框包含4个偏移位置和21个类别置信度(4+21)。因此,block7的最后输出为(19*19)*6*(4+21)。

<img src="https://pic1.zhimg.com/v2-5964f6dff6dbbd435336cde9e5dfc988_b.png" class="content_image">

图1-2 多尺度特征采样(此图来源:知乎专栏)


其中,初始化参数如下:

    """
    Implementation of the SSD VGG-based 300 network.    The default features layers with 300x300 image input are:
      conv4 ==> 38 x 38
      conv7 ==> 19 x 19
      conv8 ==> 10 x 10
      conv9 ==> 5 x 5
      conv10 ==> 3 x 3
      conv11 ==> 1 x 1
    The default image size used to train this network is 300x300.
    """default_params = SSDParams(img_shape=(300, 300),#输入尺寸num_classes=21,#预测类别20+1=21(20类加背景)#获取feature map层feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'],feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],anchor_size_bounds=[0.15, 0.90],#anchor boxes的大小anchor_sizes=[(21., 45.),(45., 99.),(99., 153.),(153., 207.),(207., 261.),(261., 315.)],#anchor boxes的aspect ratiosanchor_ratios=[[2, .5],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5],[2, .5]],anchor_steps=[8, 16, 32, 64, 100, 300],#anchor的层anchor_offset=0.5,#补偿阀值0.5normalizations=[20, -1, -1, -1, -1, -1],#该特征层是否正则,大于零即正则;小于零则否prior_scaling=[0.1, 0.1, 0.2, 0.2])

建立模型代码如下,作者采用了TensorFlow-Slim(类似于keras的高层库)来建立网络模型,详细内容可以参考TensorFlow-Slim网页。

#建立ssd网络函数
def ssd_net(inputs,num_classes=21,feat_layers=SSDNet.default_params.feat_layers,anchor_sizes=SSDNet.default_params.anchor_sizes,anchor_ratios=SSDNet.default_params.anchor_ratios,normalizations=SSDNet.default_params.normalizations,is_training=True,dropout_keep_prob=0.5,prediction_fn=slim.softmax,reuse=None,scope='ssd_300_vgg'):"""SSD net definition.
    """# End_points collect relevant activations for external use.#用于收集每一层输出结果end_points = {}#采用slim建立vgg网络,网络结构参考文章内的结构图with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):# Original VGG-16 blocks.net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')end_points['block1'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool1')# Block 2.net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')end_points['block2'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool2')# Block 3.net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')end_points['block3'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool3')# Block 4.net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')end_points['block4'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool4')# Block 5.net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')end_points['block5'] = netnet = slim.max_pool2d(net, [3, 3], 1, scope='pool5')#max pool#外加的SSD层# Additional SSD blocks.# Block 6: let's dilate the hell out of it!#输出shape为19×19×1024net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')end_points['block6'] = net# Block 7: 1x1 conv. Because the fuck.#卷积核为1×1net = slim.conv2d(net, 1024, [1, 1], scope='conv7')end_points['block7'] = net# Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).end_point = 'block8'with tf.variable_scope(end_point):net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3')end_points[end_point] = netend_point = 'block9'with tf.variable_scope(end_point):net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3')end_points[end_point] = netend_point = 'block10'with tf.variable_scope(end_point):net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')end_points[end_point] = netend_point = 'block11'with tf.variable_scope(end_point):net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')end_points[end_point] = net# Prediction and localisations layers.#预测和定位predictions = []logits = []localisations = []for i, layer in enumerate(feat_layers):with tf.variable_scope(layer + '_box'):#接受特征层的输出,生成类别和位置预测p, l = ssd_multibox_layer(end_points[layer],num_classes,anchor_sizes[i],anchor_ratios[i],normalizations[i])#把每一层的预测收集predictions.append(prediction_fn(p))#prediction_fn为softmax,预测类别logits.append(p)#概率localisations.append(l)#预测位置信息return predictions, localisations, logits, end_points

2 anchor box生成

对每一张特征图,按照不同的大小(scale) 和长宽比(ratio) 生成生成k个默认框(default boxes),原理图如图2-1所示(此图中,默认框数目k=6,其中5×5的红色点代表特征图,因此:5*5*6 = 150 个boxes)。

每个默认框大小计算公式为:s_{k}=s_{min} +\frac{s_{max}-s_{min}  }{m-1}(k-1),k\in [1,m] ,其中,m为特征图数目,s_{min} 为最底层特征图大小(原论文中值为0.2,代码中为0.15),s_{max} 为最顶层特征图默认框大小(原论文中为0.9,代码中为0.9)。

每个默认框长宽比根据比例值计算,原论文中比例值为\left\{ 1,2,3,1/2,1/3 \right\} ,因此,每个默认框的宽为w_{k}^{a} =s_{k}\sqrt{a_{r} }  ,高为h_{k}^{a} =s_{k}/\sqrt{a_{r} }  。对于比例为1的默认框,额外添加一个比例为s_{k}^{'} =\sqrt{s_{k}s_{k+1}} 的默认框。最终,每张特征图中的每个点生成6个默认框。每个默认框中心设定为(\frac{i+0.5}{|f_{k} |},\frac{j+0.5}{|f_{k} |} ),其中,\left| f_{k} \right| 为第k个特征图尺寸。


<img src="https://pic4.zhimg.com/v2-e128c01e26456fa24502e2c05bf46e1b_b.png" class="content_image">
<img src="https://pic3.zhimg.com/v2-e6f0dd799661fff724853435b976a82e_b.png" class="content_image">
<img src="https://pic3.zhimg.com/v2-64a521f37e62fe79c9b5d11746eb6686_b.png" class="content_image">

图2-1 anchor box生成示意图(此图来源于知乎专栏)

源代码中,默认框生成函数为ssd_anchor_one_layer(),代码如下:

#生成一层的anchor boxes
def ssd_anchor_one_layer(img_shape,#原始图像shapefeat_shape,#特征图shapesizes,#预设的box sizeratios,#aspect 比例step,#anchor的层offset=0.5,dtype=np.float32):"""Computer SSD default anchor boxes for one feature layer.    Determine the relative position grid of the centers, and the relative
    width and height.    Arguments:
      feat_shape: Feature shape, used for computing relative position grids;
      size: Absolute reference sizes;
      ratios: Ratios to use on these features;
      img_shape: Image shape, used for computing height, width relatively to the
        former;
      offset: Grid offset.    Return:
      y, x, h, w: Relative x and y grids, and height and width.
    """# Compute the position grid: simple way.# y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]# y = (y.astype(dtype) + offset) / feat_shape[0]# x = (x.astype(dtype) + offset) / feat_shape[1]# Weird SSD-Caffe computation using steps values..."""
    #测试中,参数如下
    feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)]
    anchor_sizes=[(21., 45.),
                      (45., 99.),
                      (99., 153.),
                      (153., 207.),
                      (207., 261.),
                      (261., 315.)]
    anchor_ratios=[[2, .5],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5],
                       [2, .5]]
    anchor_steps=[8, 16, 32, 64, 100, 300]    offset=0.5    dtype=np.float32    feat_shape=feat_shapes[0]
    step=anchor_steps[0]
    """#测试中,y和x的shape为(38,38)(38,38)#y的值为#array([[ 0,  0,  0, ...,  0,  0,  0],#  [ 1,  1,  1, ...,  1,  1,  1],# [ 2,  2,  2, ...,  2,  2,  2],#   ..., #  [35, 35, 35, ..., 35, 35, 35],#  [36, 36, 36, ..., 36, 36, 36],#  [37, 37, 37, ..., 37, 37, 37]])y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]#测试中y=(y+0.5)×8/300,x=(x+0.5)×8/300y = (y.astype(dtype) + offset) * step / img_shape[0]x = (x.astype(dtype) + offset) * step / img_shape[1]#扩展维度,维度为(38,38,1)# Expand dims to support easy broadcasting.y = np.expand_dims(y, axis=-1)x = np.expand_dims(x, axis=-1)# Compute relative height and width.# Tries to follow the original implementation of SSD for the order.#数值为2+2num_anchors = len(sizes) + len(ratios)#shape为(4,)h = np.zeros((num_anchors, ), dtype=dtype)w = np.zeros((num_anchors, ), dtype=dtype)# Add first anchor boxes with ratio=1.#测试中,h[0]=21/300,w[0]=21/300?h[0] = sizes[0] / img_shape[0]w[0] = sizes[0] / img_shape[1]di = 1if len(sizes) > 1:#h[1]=sqrt(21*45)/300h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]di += 1for i, r in enumerate(ratios):h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)#测试中,y和x shape为(38,38,1)#h和w的shape为(4,)return y, x, h, w

3 ground truth预处理

训练过程中,首先需要将label信息(ground truth box,ground truth category)进行预处理,将其对应到相应的默认框上。根据默认框和ground truth box的jaccard 重叠来寻找对应的默认框。文章中选取了jaccard重叠超过0.5的默认框为正样本,其它为负样本。

源代码ground truth预处理代码位于ssd_common.py文件中,关键代码如下:

#label和bbox编码函数
def tf_ssd_bboxes_encode_layer(labels,#ground truth标签,1D tensorbboxes,#N×4 Tensor(float)anchors_layer,#anchors,为listmatching_threshold=0.5,#阀值prior_scaling=[0.1, 0.1, 0.2, 0.2],#缩放dtype=tf.float32):"""Encode groundtruth labels and bounding boxes using SSD anchors from
    one layer.    Arguments:
      labels: 1D Tensor(int64) containing groundtruth labels;
      bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
      anchors_layer: Numpy array with layer anchors;
      matching_threshold: Threshold for positive match with groundtruth bboxes;
      prior_scaling: Scaling of encoded coordinates.    Return:
      (target_labels, target_localizations, target_scores): Target Tensors.
    """# Anchors coordinates and volume.#获取anchors层yref, xref, href, wref = anchors_layerymin = yref - href / 2.xmin = xref - wref / 2.ymax = yref + href / 2.xmax = xref + wref / 2.#xmax的shape为((38, 38, 1), (38, 38, 1), (4,), (4,))
(38, 38, 4)#体积vol_anchors = (xmax - xmin) * (ymax - ymin)# Initialize tensors...shape = (yref.shape[0], yref.shape[1], href.size)feat_labels = tf.zeros(shape, dtype=tf.int64)feat_scores = tf.zeros(shape, dtype=dtype)#shape为(38,38,4)feat_ymin = tf.zeros(shape, dtype=dtype)feat_xmin = tf.zeros(shape, dtype=dtype)feat_ymax = tf.ones(shape, dtype=dtype)feat_xmax = tf.ones(shape, dtype=dtype)#计算jaccard重合def jaccard_with_anchors(bbox):"""Compute jaccard score a box and the anchors.
        """# Intersection bbox and volume.int_ymin = tf.maximum(ymin, bbox[0])int_xmin = tf.maximum(xmin, bbox[1])int_ymax = tf.minimum(ymax, bbox[2])int_xmax = tf.minimum(xmax, bbox[3])h = tf.maximum(int_ymax - int_ymin, 0.)w = tf.maximum(int_xmax - int_xmin, 0.)# Volumes.inter_vol = h * wunion_vol = vol_anchors - inter_vol \+ (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])jaccard = tf.div(inter_vol, union_vol)return jaccard#条件函数 def condition(i, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax):"""Condition: check label index.
        """#tf.less函数 Returns the truth value of (x < y) element-wise.r = tf.less(i, tf.shape(labels))return r[0]#主体def body(i, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax):"""Body: update feature labels, scores and bboxes.
        Follow the original SSD paper for that purpose:
          - assign values when jaccard > 0.5;
          - only update if beat the score of other bboxes.
        """# Jaccard score.label = labels[i]bbox = bboxes[i]scores = jaccard_with_anchors(bbox)#计算jaccard重合值# 'Boolean' mask.#tf.greater函数返回大于的布尔值mask = tf.logical_and(tf.greater(scores, matching_threshold),tf.greater(scores, feat_scores))imask = tf.cast(mask, tf.int64)fmask = tf.cast(mask, dtype)# Update values using mask.feat_labels = imask * label + (1 - imask) * feat_labelsfeat_scores = tf.select(mask, scores, feat_scores)feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_yminfeat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xminfeat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymaxfeat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmaxreturn [i+1, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax]# Main loop definition.i = 0[i, feat_labels, feat_scores,feat_ymin, feat_xmin,feat_ymax, feat_xmax] = tf.while_loop(condition, body,[i, feat_labels, feat_scores,feat_ymin, feat_xmin,feat_ymax, feat_xmax])# Transform to center / size.#计算补偿后的中心feat_cy = (feat_ymax + feat_ymin) / 2.feat_cx = (feat_xmax + feat_xmin) / 2.feat_h = feat_ymax - feat_yminfeat_w = feat_xmax - feat_xmin# Encode features.feat_cy = (feat_cy - yref) / href / prior_scaling[0]feat_cx = (feat_cx - xref) / wref / prior_scaling[1]feat_h = tf.log(feat_h / href) / prior_scaling[2]feat_w = tf.log(feat_w / wref) / prior_scaling[3]# Use SSD ordering: x / y / w / h instead of ours.feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)return feat_labels, feat_localizations, feat_scores#ground truth编码函数
def tf_ssd_bboxes_encode(labels,#ground truth标签,1D tensorbboxes,#N×4 Tensor(float)anchors,#anchors,为listmatching_threshold=0.5,#阀值prior_scaling=[0.1, 0.1, 0.2, 0.2],#缩放dtype=tf.float32,scope='ssd_bboxes_encode'):"""Encode groundtruth labels and bounding boxes using SSD net anchors.
    Encoding boxes for all feature layers.    Arguments:
      labels: 1D Tensor(int64) containing groundtruth labels;
      bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
      anchors: List of Numpy array with layer anchors;
      matching_threshold: Threshold for positive match with groundtruth bboxes;
      prior_scaling: Scaling of encoded coordinates.    Return:
      (target_labels, target_localizations, target_scores):
        Each element is a list of target Tensors.
    """with tf.name_scope(scope):target_labels = []target_localizations = []target_scores = []for i, anchors_layer in enumerate(anchors):with tf.name_scope('bboxes_encode_block_%i' % i):#将label和bbox进行编码t_labels, t_loc, t_scores = \tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,matching_threshold, prior_scaling, dtype)target_labels.append(t_labels)target_localizations.append(t_loc)target_scores.append(t_scores)return target_labels, target_localizations, target_scores#编码goundtruth的label和bboxdef bboxes_encode(self, labels, bboxes, anchors,scope='ssd_bboxes_encode'):"""Encode labels and bounding boxes.
        """return ssd_common.tf_ssd_bboxes_encode(labels, bboxes, anchors,matching_threshold=0.5,prior_scaling=self.params.prior_scaling,scope=scope)

4 目标函数

SSD目标函数分为两个部分:对应默认框的位置loss(loc)和类别置信度loss(conf)。定义x_{ij}^{p}=\left\{ 1,0 \right\}  为第i个默认框和对应的第j个ground truth box,相应的类别为p。目标函数定义为:

L(x,c,l,g)=\frac{1}{N}(L_{conf}(x,c)+\alpha L_{loc}(x,l,g) )

其中,N为匹配的默认框。如果N=0,loss为零。L_{conf} 为预测框l和ground truth box g的Smooth L1 loss,\alpha 值通过cross validation设置为1。

<img src="https://pic2.zhimg.com/v2-f7f9cd187a7e4cf8fb2c430a844bdc5d_b.png" data-rawwidth="441" data-rawheight="93" class="origin_image zh-lightbox-thumb" width="441" data-original="https://pic2.zhimg.com/v2-f7f9cd187a7e4cf8fb2c430a844bdc5d_r.png">

L_{loc} 定义如下:<img src="https://pic1.zhimg.com/v2-c59028fcd350680c60002216cac34434_b.png" data-rawwidth="539" data-rawheight="184" class="origin_image zh-lightbox-thumb" width="539" data-original="https://pic1.zhimg.com/v2-c59028fcd350680c60002216cac34434_r.png">其中,
其中,l为预测框,g为ground truth。(cx,cy)为补偿(regress to offsets)后的默认框(d)的中心,(w,h)为默认框的宽和高。

L_{conf} 定义为多累别softmax loss,公式如下:

<img src="https://pic3.zhimg.com/v2-b5772e77cfe447103133b90c05a807ee_b.png" data-rawwidth="739" data-rawheight="75" class="origin_image zh-lightbox-thumb" width="739" data-original="https://pic3.zhimg.com/v2-b5772e77cfe447103133b90c05a807ee_r.png">目标函数定义源码位于ssd_vgg_300.py,注释如下:

目标函数定义源码位于ssd_vgg_300.py,注释如下:

# =========================================================================== #
# SSD loss function.
# =========================================================================== #
def ssd_losses(logits, #预测类别localisations,#预测位置gclasses, #ground truth 类别glocalisations, #ground truth 位置gscores,#ground truth 分数match_threshold=0.5,negative_ratio=3.,alpha=1.,label_smoothing=0.,scope='ssd_losses'):"""Loss functions for training the SSD 300 VGG network.    This function defines the different loss components of the SSD, and
    adds them to the TF loss collection.    Arguments:
      logits: (list of) predictions logits Tensors;
      localisations: (list of) localisations Tensors;
      gclasses: (list of) groundtruth labels Tensors;
      glocalisations: (list of) groundtruth localisations Tensors;
      gscores: (list of) groundtruth score Tensors;
    """# Some debugging...# for i in range(len(gclasses)):#     print(localisations[i].get_shape())#     print(logits[i].get_shape())#     print(gclasses[i].get_shape())#     print(glocalisations[i].get_shape())#     print()with tf.name_scope(scope):l_cross = []l_loc = []for i in range(len(logits)):with tf.name_scope('block_%i' % i):# Determine weights Tensor.pmask = tf.cast(gclasses[i] > 0, logits[i].dtype)n_positives = tf.reduce_sum(pmask)#正样本数目#np.prod函数Return the product of array elements over a given axisn_entries = np.prod(gclasses[i].get_shape().as_list())# r_positive = n_positives / n_entries# Select some random negative entries.r_negative = negative_ratio * n_positives / (n_entries - n_positives)#负样本数nmask = tf.random_uniform(gclasses[i].get_shape(),dtype=logits[i].dtype)nmask = nmask * (1. - pmask)nmask = tf.cast(nmask > 1. - r_negative, logits[i].dtype)#cross_entropy loss# Add cross-entropy loss.with tf.name_scope('cross_entropy'):# Weights Tensor: positive mask + random negative.weights = pmask + nmaskloss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits[i],gclasses[i])loss = tf.contrib.losses.compute_weighted_loss(loss, weights)l_cross.append(loss)#smooth loss# Add localization loss: smooth L1, L2, ...with tf.name_scope('localization'):# Weights Tensor: positive mask + random negative.weights = alpha * pmaskloss = custom_layers.abs_smooth(localisations[i] - glocalisations[i])loss = tf.contrib.losses.compute_weighted_loss(loss, weights)l_loc.append(loss)# Total losses in summaries...with tf.name_scope('total'):tf.summary.scalar('cross_entropy', tf.add_n(l_cross))tf.summary.scalar('localization', tf.add_n(l_loc))

5 总结

本文对SSD: Single Shot MultiBox Detector的tensorflow的关键源代码进行了解析。本文采用的源码来自于balancap/SSD-Tensorflow。源码作者写得非常详细,内容较多(其它还包括了图像预处理,多GPU并行训练等许多内容),因此只选取了关键代码进行解析。在看完论文后,再结合关键代码分析,结构就很清晰了。SSD代码实现的关键点为:1,多尺度特征图检测网络结构;2,anchor boxes生成;3,ground truth预处理;4,目标函数。SSD和YOLOv2类似,可以实现高准确率下的实时目标检测,是非常值得研究和改进的目标检测方法。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/510319.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

YOLO升级版:YOLOv2和YOLO9000解析

论文笔记&#xff1a;YOLO9000: Better, Faster, Stronger&#xff1b;官方网站 评论&#xff1a;YOLO是基于深度学习方法的端到端实时目标检测系统&#xff08;YOLO&#xff1a;实时快速目标检测&#xff09;。YOLO的升级版有两种&#xff1a;YOLOv2和YOLO9000。作者采用了一系…

解读Batch Normalization

【活动】Python创意编程活动开始啦&#xff01;&#xff01;&#xff01; CSDN日报20170424 ——《技术方向的选择》 程序员4月书讯&#xff1a;Angular来了&#xff01;解读Batch Normalization 2016-02-23 16:03 5262人阅读 评论(1) 收藏 举报 本文章已收录于&#xf…

SSD+caffe︱Single Shot MultiBox Detector 目标检测+fine-tuning(二)

承接上一篇SSD介绍&#xff1a;SSDcaffe︱Single Shot MultiBox Detector 目标检测&#xff08;一&#xff09; 如果自己要训练SSD模型呢&#xff0c;关键的就是LMDB格式生成&#xff0c;从官方教程weiliu89/caffe来看&#xff0c;寥寥几行code&#xff0c;但是前面的数据整理…

sdut 双向队列(STL)

deque<ll>q;//定义一个双向队列q&#xff0c;类型为long long q.push_front(a);//将a从队首插入队列 q.push_back(a);//将a从队尾插入队列 q.pop_front();//队首弹掉一个元素 q.pop_back();//队尾弹出一个元素 aq.front();//返回队首元素 aq.back();//返回队尾元素 aq.si…

pip install scikit-image on windows 失败 解决方案

到官网下载包 http://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-image 我们下载 scikit_image‑0.13.0‑cp35‑cp35m‑win_amd64.whl 也可到我的CNDS下载 安装时会出现一下错误 scikit_image-0.13.0-cp34-none-win_amd64.whl is not a supported wheel on this platform…

Caffe训练源码基本流程

苏的专栏 致力于学习计算机视觉、模式识别、机器学习、深度学习相关技术&#xff1a;&#xff1a;&#xff1a;&#xff1a;希望结识更多同道中人。 QQ&#xff1a;2816568984 微信&#xff1a;Suc1011 目录视图 摘要视图 订阅 【活动】2017 CSDN博客专栏评选 【评论送书】…

ubuntu16.04下安装opencv3.2版本

1、下载OpenCV的源码 OpenCV官网上有linux版本的源码包可以下载&#xff0c;不过最好是从git上下载&#xff0c;这样可以保证下载得到的是最新的代码&#xff1a;1 wget -O opencv.zip https://github.com/Itseez/opencv/archive/3.2.0.zip 2 unzip opencv.zip 2、编译安装…

windows 静态IP设置

1.打开cmd&#xff0c;输入ipconfig 记录 IP address 子网掩码 网关 打开 wlan 属性 选中一个网络右键 “属性” 设置 ipv4&#xff0c;选择“使用下面的IP地址” 对应填写 ip地址 子网掩码 网关 tips&#xff1a;DNS建议和网关一样

window 远程linux

1.我们需要在Linux安装ssh服务 sudo apt-get install openssh-server 2.然后开启ssh服务 sudo /etc/init.d/ssh start 3.在window上安装PUTTY 下载网址 4.填写Linux ip 查看ip: 在Linux输入&#xff1a;ifconfig 5.然后填写登陆Linux的账户名和密码 大功告成。。。。。…

赫夫曼编码长度计算问题?

例题&#xff1a;一组字符(a,b,c,d)在文中出现的次数分别为(7,6,3,5),字符&#xff07;d&#xff07;的哈夫曼编码的长度为&#xff1f; 题解&#xff1a; 首先构造huffman树 每一步都将所有数字排序 方法如下: 1: 3 5 6 7 2: 6 7 8 / \ 3 5 3: 8 13 / \ / \ 3 5 6 7 4: 21 /…

windows + cmake + vs2019 编程

1.安装minGW64 2.安装cmake 3.安装vs2019 4.组建代码文件结构&#xff1a; 5.在build 文件下打开git bash&#xff0c;执行一下指令 cmake .. -G"Visual Studio 16 2019" tips( vs2017 对应 cmake .. -G"Visual Studio 15 2017" ) cmake 模板…

所感

记住别太善良了&#xff0c;别太大方了&#xff0c;也别太能干了&#xff0c;时间久了人家会觉得&#xff0c;你做的一切都是应该的。即使有一天你撑不住&#xff0c;哭了累了&#xff0c;也没人心疼你。 因为在他们眼里这都是你愿意的。有时候心眼也别太好了不要什么事都为别人…

享受孤独

今天看见网易云音乐一个 有趣的活动。。。突发其感参与了评论“孤独”这个话题&#xff1a;

使自己的注意力集中方法

英国Kent大学最近有一篇文章对注意力做出了详尽的分析&#xff0c;关于如何保持专心养成好习惯的&#xff0c;其中包括了如下几点建议&#xff1a; 养成好习惯&#xff1a;养成在固定时间、固定地点专心学习工作的好习惯。如果可能&#xff0c;在进入学习或者工作状态前做一些小…

PCL “(”:“::”右边的非法标记 和 E2512 功能测试宏的参数必须是简单标识符

PCL “(”:“::”右边的非法标记 解决方法&#xff1a; 项目属性 ——> C/C ——> 预处理器 ——> 预处理器定义 (此处添加预定义编译开关 NOMINMAX&#xff09; E2512 功能测试宏的参数必须是简单标识符 解决方法&#xff1a; 将SDL 设置为否。

普通本科生应该坚持ACM吗?知乎

这是本人当初学ACM有疑惑从知乎看见这一篇文章&#xff0c;从中解决了自己的疑惑&#xff0c;虽然是粘贴复制&#xff0c;但是我觉得我们可以从里面找出自己想要的答案&#xff01; 非211一本学校&#xff0c;软件工程专业。学校搞ACM的水平不高&#xff0c;最好的大概就是区域…

解决虚拟机打开不了?提示VMware Workstation cannot connect to the virtual machine的问题

解决方法&#xff1a; 从提示消息我们可以看到&#xff0c;问题在于VMware授权服务没有开启&#xff0c;具体处理方法如下&#xff1a; No1. "This PC&#xff08;我的电脑&#xff09;"---右键"manage&#xff08;管理&#xff09;"---"Service and…

Redis-3.2.4服务搭建

1.下载安装包并解压 全部版本地址&#xff1a;http://download.redis.io/releases 安装包下载&#xff1a; http://download.redis.io/releases/redis-3.2.4.tar.gz 我们这里使用3.2.4 2.编译安装 cd redis-3.2.4 make && make install 可能异常&#xff1a;&…

Android Fragment 真正的完全解析(上)

转载请标明出处&#xff1a;http://blog.csdn.net/lmj623565791/article/details/37970961 自从Fragment出现&#xff0c;曾经有段时间&#xff0c;感觉大家谈什么都能跟Fragment谈上关系&#xff0c;做什么都要问下Fragment能实现不~~~哈哈&#xff0c;是不是有点过~~~ 本篇博…

Hbase Import导入数据异常处理-RetriesExhaustedWithDetailsException

CDH显示 问题导致原因&#xff1a; hbase org.apache.hadoop.hbase.mapreduce.Import -Dmapred.job.queue.nameetl crawl:wechat_biz /hbase/test4 执行import时&#xff0c;短时间内写入数据量过大导致写入异常。 18/09/11 09:44:27 INFO mapreduce.Job: Task Id : attempt_…