04.卷积神经网络 W3.目标检测(作业:自动驾驶 - 汽车检测)

文章目录

    • 1. 问题背景
    • 2. YOLO 模型
      • 2.1 模型细节
      • 2.2 分类阈值过滤
      • 2.3 非极大值抑制
      • 2.4 完成过滤
    • 3. 在照片上测试已预训练的YOLO模型
      • 3.1 定义类别、anchors、图片尺寸
      • 3.2 加载已预训练的模型
      • 3.3 模型输出转化为可用的边界框变量
      • 3.4 过滤边界框
      • 3.5 在图片上运行

测试题:参考博文

笔记:04.卷积神经网络 W3.目标检测

参考论文:
Redmon et al., 2016 (https://arxiv.org/abs/1506.02640)
Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242)


导入一些包:

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body%matplotlib inline
  • from keras import backend as K, 使用 Keras 的函数可以这么写 K.function(...)

1. 问题背景

在车上装的摄像头采集了汽车道路行驶过程中的照片,所有的照片做了标记,在照片里对每个汽车目标画了方框


因为YOLO模型的训练非常昂贵,我们将加载预先训练好的权重

2. YOLO 模型

YOLO(you only look once)是一种流行的算法,因为它在实现高精度的同时还能够实时运行。

这个算法“只看一次”图像,因为它只需要一次前向传播通过网络来进行预测。
在非最大值抑制之后,它输出识别的对象和边界框。

2.1 模型细节

  • 输入:一批图片,维度:(m, 608, 608, 3)
  • 输出:(pc,bx,by,bh,bw,c)(p_c, b_x, b_y, b_h, b_w, c)(pc,bx,by,bh,bw,c)ccc 可以展开,如果你需要识别80个类别,那么输出就是 85 个数字

我们将使用 5 个 anchor boxes,模型结构如下:
模型结构
如果一个目标的中点在某个方格内,这个方格就负责检测那个目标

展平最后两维
19x19的方格中,每个格子中输出包含 5个 anchor boxes,每个 anchor boxes 包含 对应的标签 85 个数字
预测

可视化预测过程:

  • 对于19x19的网格,找到 5个 box里最大概率的类别
  • 按照概率最大的类别,给目标着色

请注意,这种可视化并不是YOLO算法本身用于进行预测的核心部分;
它只是可视化算法中间结果的一种很好的方式

可视化预测过程
还有一种可视化:

  • 绘制边界框

边界框太多:进行 non max suppression 非最大值抑制

  • 去掉分数低的框
  • 当多个框相互重叠并检测到同一个对象时,只选择一个框

2.2 分类阈值过滤

建立过滤器,去掉任何一个“分数”低于所选阈值的框

模型给你 19x19x5x85 的数字,每个边框包含着 85 个数,把数据拆分下方便后序操作:

  • box_confidence: tensor of shape (19×19,5,1) , 每个格子,5个box预测对象的置信概率
  • boxes: tensor of shape (19×19,5,4),包含每个格子,5个box的 (bx,by,bh,bw)(b_x, b_y, b_h, b_w)(bx,by,bh,bw) 位置信息
  • box_class_probs: tensor of shape (19×19,5,80),包含每个格子,5个box的80种目标的探测概率 (c1,c2,...c80)(c_1, c_2, ... c_{80})(c1,c2,...c80)

boolean_mask 参考:https://www.tensorflow.org/api_docs/python/tf/boolean_mask

tf.boolean_mask( tensor, mask, axis=None, name=‘boolean_mask’ )

# GRADED FUNCTION: yolo_filter_boxesdef yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):"""Filters YOLO boxes by thresholding on object and class confidence.Arguments:box_confidence -- tensor of shape (19, 19, 5, 1)boxes -- tensor of shape (19, 19, 5, 4)box_class_probs -- tensor of shape (19, 19, 5, 80)threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding boxReturns:scores -- tensor of shape (None,), containing the class probability score for selected boxesboxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxesclasses -- tensor of shape (None,), containing the index of the class detected by the selected boxesNote: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. For example, the actual output size of scores would be (10,) if there are 10 boxes."""# Step 1: Compute box scores### START CODE HERE ### (≈ 1 line)box_scores = box_confidence*box_class_probs### END CODE HERE #### Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score### START CODE HERE ### (≈ 2 lines)box_classes = K.argmax(box_scores, axis=-1)box_class_scores = K.max(box_scores, axis=-1)### END CODE HERE #### Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the# same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)### START CODE HERE ### (≈ 1 line)filtering_mask = box_class_scores >= threshold### END CODE HERE #### Step 4: Apply the mask to scores, boxes and classes### START CODE HERE ### (≈ 3 lines)scores = tf.boolean_mask(box_class_scores,filtering_mask)boxes = tf.boolean_mask(boxes, filtering_mask)classes = tf.boolean_mask(box_classes, filtering_mask)### END CODE HERE ###return scores, boxes, classes

2.3 非极大值抑制

过滤以后,还有很多重叠的边界框,这时我们使用 non maximum suppression (NMS)

NMS 使用最高交并比(IoU)的边框作为预测结果

# GRADED FUNCTION: ioudef iou(box1, box2):"""Implement the intersection over union (IoU) between box1 and box2Arguments:box1 -- first box, list object with coordinates (x1, y1, x2, y2)box2 -- second box, list object with coordinates (x1, y1, x2, y2)"""# Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.### START CODE HERE ### (≈ 5 lines)xi1 = np.maximum(box1[0],box2[0])yi1 = np.maximum(box1[1],box2[1])xi2 = np.minimum(box1[2],box2[2])yi2 = np.minimum(box1[3],box2[3])inter_area = (xi2-xi1)*(yi2-yi1)### END CODE HERE ###    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)### START CODE HERE ### (≈ 3 lines)box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])union_area = box1_area + box2_area - inter_area### END CODE HERE #### compute the IoU### START CODE HERE ### (≈ 1 line)iou = inter_area/union_area### END CODE HERE ###return iou

非最大值抑制步骤:

  1. 选出最高分的 box
  2. 计算它与其它的box的重叠,删掉重叠大于阈值的box
  3. 转到 1 继续执行,直到没有box比当前选的box得分低

TF 内置 NMS https://www.tensorflow.org/api_docs/python/tf/image/non_max_suppression

https://www.tensorflow.org/api_docs/python/tf/gather

# GRADED FUNCTION: yolo_non_max_suppressiondef yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):"""Applies Non-max suppression (NMS) to set of boxesArguments:scores -- tensor of shape (None,), output of yolo_filter_boxes()boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)classes -- tensor of shape (None,), output of yolo_filter_boxes()max_boxes -- integer, maximum number of predicted boxes you'd likeiou_threshold -- real value, "intersection over union" threshold used for NMS filteringReturns:scores -- tensor of shape (, None), predicted score for each boxboxes -- tensor of shape (4, None), predicted box coordinatesclasses -- tensor of shape (, None), predicted class for each boxNote: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that thisfunction will transpose the shapes of scores, boxes, classes. This is made for convenience."""max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep### START CODE HERE ### (≈ 1 line)nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)### END CODE HERE #### Use K.gather() to select only nms_indices from scores, boxes and classes### START CODE HERE ### (≈ 3 lines)scores = K.gather(scores, nms_indices)boxes = K.gather(boxes, nms_indices)classes = K.gather(classes, nms_indices)### END CODE HERE ###return scores, boxes, classes

2.4 完成过滤

两个辅助函数:

  • boxes = yolo_boxes_to_corners(box_xy, box_wh) 可以将box转成 两个顶点的表达方式
  • boxes = scale_boxes(boxes, image_shape) 缩放box以便在不同的size的图片上显示
# GRADED FUNCTION: yolo_evaldef yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):"""Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.Arguments:yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:box_confidence: tensor of shape (None, 19, 19, 5, 1)box_xy: tensor of shape (None, 19, 19, 5, 2)box_wh: tensor of shape (None, 19, 19, 5, 2)box_class_probs: tensor of shape (None, 19, 19, 5, 80)image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)max_boxes -- integer, maximum number of predicted boxes you'd likescore_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding boxiou_threshold -- real value, "intersection over union" threshold used for NMS filteringReturns:scores -- tensor of shape (None, ), predicted score for each boxboxes -- tensor of shape (None, 4), predicted box coordinatesclasses -- tensor of shape (None,), predicted class for each box"""### START CODE HERE ### # Retrieve outputs of the YOLO model (≈1 line)box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs# Convert boxes to be ready for filtering functions boxes = yolo_boxes_to_corners(box_xy, box_wh)# Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=score_threshold)# Scale boxes back to original image shape.boxes = scale_boxes(boxes, image_shape)# Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)### END CODE HERE ###return scores, boxes, classes

YOLO 模型总结:

  • 输入 608*608*3 的图片,经过 卷积NN,得到 19*19*5*85的输出
  • 展平最后两维就是 19*19*425,19x19的每个网格包含有 425 个数
  • 5 是因为选了 5 种 anchor boxes, 85 = 80个类别 + 5 个参数 (𝑝𝑐,𝑏𝑥,𝑏𝑦,𝑏ℎ,𝑏𝑤)
  • 然后只选出了一些边框(阈值过滤,非最大值抑制)

3. 在照片上测试已预训练的YOLO模型

  • 创建 session
sess = K.get_session()

3.1 定义类别、anchors、图片尺寸

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
image_shape = (720., 1280.)    

coco_classes文件里定义了80种物体的名称
yolo_anchors文件里有10个浮点数,定义了5种 anchor box 的形状

3.2 加载已预训练的模型

报错:module 'tensorflow' has no attribute 'space_to_depth'

版本问题真的很麻烦,安装以下版本不报错(python 3.7环境)

pip uninstall tensorflow
pip uninstall keras
pip install tensorflow==1.14.0
pip install keras==2.3.1
yolo_model = load_model("model_data/yolo.h5")

模型预览:

yolo_model.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 608, 608, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 608, 608, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 608, 608, 32) 128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 608, 608, 32) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 304, 304, 32) 0           leaky_re_lu_1[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 304, 304, 64) 18432       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 304, 304, 64) 256         conv2d_2[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 304, 304, 64) 0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 152, 152, 64) 0           leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 152, 152, 128 73728       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 152, 152, 128 512         conv2d_3[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 152, 152, 64) 8192        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 152, 152, 64) 256         conv2d_4[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 152, 152, 64) 0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 152, 152, 128 73728       leaky_re_lu_4[0][0]              
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 152, 152, 128 512         conv2d_5[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 76, 76, 128)  0           leaky_re_lu_5[0][0]              
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 76, 76, 256)  294912      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 76, 76, 256)  1024        conv2d_6[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_6[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 76, 76, 128)  32768       leaky_re_lu_6[0][0]              
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 76, 76, 128)  512         conv2d_7[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)       (None, 76, 76, 128)  0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 76, 76, 256)  294912      leaky_re_lu_7[0][0]              
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 76, 76, 256)  1024        conv2d_8[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_8[0][0]      
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 38, 38, 256)  0           leaky_re_lu_8[0][0]              
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 38, 38, 512)  1179648     max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 38, 38, 512)  2048        conv2d_9[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)       (None, 38, 38, 512)  0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_9[0][0]              
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 38, 38, 256)  1024        conv2d_10[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_10[0][0]             
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 38, 38, 512)  2048        conv2d_11[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_11[0][0]             
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 38, 38, 256)  1024        conv2d_12[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_12[0][0]             
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 38, 38, 512)  2048        conv2d_13[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 19, 19, 512)  0           leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 19, 19, 1024) 4718592     max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 19, 19, 1024) 4096        conv2d_14[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_14[0][0]             
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 19, 19, 512)  2048        conv2d_15[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_15[0][0]             
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 19, 19, 1024) 4096        conv2d_16[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_16[0][0]             
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 19, 19, 512)  2048        conv2d_17[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_17[0][0]             
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 19, 19, 1024) 4096        conv2d_18[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_18[0][0]             
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 19, 19, 1024) 4096        conv2d_19[0][0]                  
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 38, 38, 64)   32768       leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 38, 38, 64)   256         conv2d_21[0][0]                  
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_19[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)      (None, 38, 38, 64)   0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 19, 19, 1024) 4096        conv2d_20[0][0]                  
__________________________________________________________________________________________________
space_to_depth_x2 (Lambda)      (None, 19, 19, 256)  0           leaky_re_lu_21[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_20[0][0]     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 19, 19, 1280) 0           space_to_depth_x2[0][0]          leaky_re_lu_20[0][0]             
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 19, 19, 1024) 11796480    concatenate_1[0][0]              
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096        conv2d_22[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_22[0][0]     
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 19, 19, 425)  435625      leaky_re_lu_22[0][0]             
==================================================================================================
Total params: 50,983,561
Trainable params: 50,962,889
Non-trainable params: 20,672

模型把一批图片 m * 608 * 608 * 3 ,转为 tensor m * 19 * 19 * 5 * 85

3.3 模型输出转化为可用的边界框变量

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

3.4 过滤边界框

  • 只选出一些边界框作为结果
scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

3.5 在图片上运行

  1. yolo_model.input is given to yolo_model. The model is used to compute the output yolo_model.output
  2. yolo_model.output is processed by yolo_head. It gives you yolo_outputs
  3. yolo_outputs goes through a filtering function, yolo_eval. It outputs your predictions: scores, boxes, classes
import imageio
def predict(sess, image_file):"""Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions.Arguments:sess -- your tensorflow/Keras session containing the YOLO graphimage_file -- name of an image stored in the "images" folder.Returns:out_scores -- tensor of shape (None, ), scores of the predicted boxesout_boxes -- tensor of shape (None, 4), coordinates of the predicted boxesout_classes -- tensor of shape (None, ), class index of the predicted boxesNote: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. """# Preprocess your imageimage, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))# Run the session with the correct tensors and choose the correct placeholders in the feed_dict.# You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})### START CODE HERE ### (≈ 1 line)out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})### END CODE HERE #### Print predictions infoprint('Found {} boxes for {}'.format(len(out_boxes), image_file))# Generate colors for drawing bounding boxes.colors = generate_colors(class_names)# Draw bounding boxes on the image filedraw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)# Save the predicted bounding box on the imageimage.save(os.path.join("out", image_file), quality=90)# Display the results in the notebookoutput_image = imageio.imread(os.path.join("out", image_file))imshow(output_image)return out_scores, out_boxes, out_classes

注意:当模型使用BatchNorm时(就像在YOLO中一样),需要在 feed_dict 中传递一个额外的 placeholder K.learning_phase(): 0

out_scores, out_boxes, out_classes = predict(sess, "test.jpg")
Found 7 boxes for test.jpg
car 0.60 (925, 285) (1045, 374)
bus 0.67 (5, 267) (220, 407)
car 0.68 (705, 279) (786, 351)
car 0.70 (947, 324) (1280, 704)
car 0.75 (159, 303) (346, 440)
car 0.80 (762, 282) (942, 412)
car 0.89 (366, 299) (745, 648)

Found 2 boxes for 1.jpg
car 0.61 (253, 466) (367, 513)
car 0.73 (179, 473) (284, 522)

  • 批量预测图片,并生成 gif 动图
out_puts_img = []
for id in range(1, 121): # 120 张图片pic_name = str(id)while len(pic_name) < 4:pic_name = '0'+pic_namepic_name = pic_name+'.jpg'out_scores, out_boxes, out_classes, out_put_img = predict(sess, pic_name) # 更改函数,多加一个输出out_puts_img.append(out_put_img)def create_gif(image_list, gif_name, duration=0.3):frames = []for img in image_list:frames.append(img)imageio.mimsave(gif_name, frames, 'GIF', duration=duration)create_gif(out_puts_img, 'out.gif', 0.5)

动图展示预测结果


我的CSDN博客地址 https://michael.blog.csdn.net/

长按或扫码关注我的公众号(Michael阿明),一起加油、一起学习进步!
Michael阿明

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/473954.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Jquery特殊效果

1.jquery特殊效果 fadeIn() 淡入 $btn.click(function(){ $(#div1).fadeIn(1000,swing,function(){ alert(done!); }); }); fadeOut() 淡出 fadeToggle()切换淡入淡出 hide()隐藏元素 show() 显示元素 toggle() 切换元素的可见状态 slideDown() 向下展…

DB2表的重组

DB2在存储大数据的时候&#xff0c;遇到一个问题&#xff0c;将数据导入表中保存不了&#xff0c;最后是重组后才解决。 下面是从IBM官网上搜集的资料&#xff1a; 官网地址&#xff1a;http://publib.boulder.ibm.com/infocenter/db2e/v9r1/index.jsp?topic%2Fcom.ibm.db2e.d…

开发语言

开发语言 高级语言&#xff1a;Python、Java、PHP、C#、Go、Ruby、C…   代码编译得到 字节码 &#xff0c;虚拟机执行字节码并转换成机器码再后在处理器上执行低级语言&#xff1a;C、汇编   代码编译得到 字节码 &#xff0c;虚拟机执行字节码并转换成机器码再后在处理器…

LeetCode 775. 全局倒置与局部倒置(归并排序/二分查找/一次遍历)

文章目录1. 题目2. 解题2.1 归并排序求逆序度2.2 二分查找2.3 一次遍历1. 题目 数组 A 是 [0, 1, ..., N - 1] 的一种排列&#xff0c;N 是数组 A 的长度。 全局倒置指的是 i,j 满足 0 < i < j < N 并且 A[i] > A[j] &#xff0c;局部倒置指的是 i 满足 0 < i…

Jquery事件、冒泡、委托与节点

1-jquery属性操作 1、html() 取出或设置html内容 // 取出html内容 var $htm $(#div1).html(); // 设置html内容 $(#div1).html(<span>添加文字</span>); 2、prop() 取出或设置某个属性的值 注&#xff1a;attr可以读取自定义属性 // 取出图片的地址 var $src $(#i…

使用 ServiceStack 构建跨平台 Web 服务

本文主要来自MSDN杂志《Building Cross-Platform Web Services with ServiceStack》&#xff0c;Windows Communication Foundation (WCF) 是一个相当优秀的服务框架&#xff0c;当我们讨论跨平台的服务的时候&#xff0c;虽然WCF对WebService的支持还行&#xff0c;在面对一些…

Hive基础(一)

一、Hive是什么 Hive是基于Hadoop的一个数据仓库工具(离线)&#xff0c;可以将结构化的数据文件映射为一张数据库表&#xff0c;并提供类SQL查询功能。&#xff0c;它能接收用户输入的sql语句&#xff0c;然后把它翻译成mapreduce程序对HDFS上的数据进行查询、运算&#xff0c;…

LeetCode 926. 将字符串翻转到单调递增(动态规划)

文章目录1. 题目2. 解题1. 题目 如果一个由 0 和 1 组成的字符串&#xff0c;是以一些 0&#xff08;可能没有 0&#xff09;后面跟着一些 1&#xff08;也可能没有 1&#xff09;的形式组成的&#xff0c;那么该字符串是单调递增的。 我们给出一个由字符 0 和 1 组成的字符串…

利用Jqurey写一个输入内容增加并且可以删除,上下移动的标签

最终结果如下&#xff0c;输入内容增加标签并且可以删除&#xff0c;上下移动&#xff1a; 代码赏析&#xff1a; <!DOCTYPE html><html lang"en"><head><meta charset"UTF-8"><title>todolist</title><style type…

java的注释、关键字、标识符、变量常量、数据类型、运算符、流程控制等

java的注释、关键字、标识符、变量常量、 数据类型、运算符、流程控制等 1. java概述 1.1 java的技术体系 Java SE&#xff1a;是Java的标准版&#xff0c;提供了完整的java核心API。 Java EE&#xff1a;是Java的企业版&#xff0c;主要用于开发…

LeetCode 851. 喧闹和富有(拓扑排序)

文章目录1. 题目2. 解题1. 题目 在一组 N 个人&#xff08;编号为 0, 1, 2, ..., N-1&#xff09;中&#xff0c;每个人都有不同数目的钱&#xff0c;以及不同程度的安静&#xff08;quietness&#xff09;。 为了方便起见&#xff0c;我们将编号为 x 的人简称为 "perso…

Jquery练习题—实现分组添加功能

实现分组添加功能&#xff1a; <title>Insert title here</title> <script type"text/javascript" src"js/jquery-1.12.4.min.js"></script> <script type"text/javascript"> $(function(){ //需求1&#xff1a;…

PostgreSQL参数学习:vacuum_defer_clean_age

官方文档&#xff1a; http://www.postgresql.org/docs/9.3/static/runtime-config-replication.html 为了防止slave端读取数据时&#xff0c;因为读到的是旧有数据而被强制取消&#xff0c;设定了这么一个以transaction为单位的值。 就是可以保证经过这么多的trsanction后&…

java的常用引用类、数组、String类

java的常用引用类、数组、String类 1. 常用引用类 1.1 Scanner 一个简单的文本扫描器类。 使用&#xff1a; //创建扫描器对象 Scanner sc new Scanner(System.in); //接收用户输入字符串类型的数据 String str sc.next(); //接收用户输入整数类型的数据 …

json-ajax-jsonp-cookie

json json是 JavaScript Object Notation的首字母缩写&#xff0c;单词的意思是javascript对象表示法&#xff0c;这里说的json指的是类似于javascript对象的一种数据格式&#xff0c;目前这种数据格式比较流行&#xff0c;逐渐替换掉了传统的xml数据格式。 json是轻量级,易解…

LeetCode 981. 基于时间的键值存储(哈希+二分查找)

文章目录1. 题目2. 解题1. 题目 创建一个基于时间的键值存储类 TimeMap&#xff0c;它支持下面两个操作&#xff1a; set(string key, string value, int timestamp) 存储键 key、值 value&#xff0c;以及给定的时间戳 timestamp。 get(string key, int timestamp) 返回先…

java的类与对象

java的类与对象 1 面向对象 1.1 面向对象与面向过程的区别 面向过程和面向对象都是解决问题的逻辑方法&#xff0c;面向过程是强调解决问题的步骤&#xff0c;可以先定义多个函数&#xff0c;在使用的使用调用函数即可&#xff1b;面向对象把问题分解成多个对象&#xff0c;…

学车总结

学车总结&#xff1a; 1、左脚离合&#xff0c;右脚油门和刹车。2、左脚脚跟着地&#xff0c;用脚尖控制离合。3、踩离合要快&#xff0c;松要慢点&#xff1b;刹车刚好相反。4、平时不要猛打方向盘&#xff0c;轻微调整就行&#xff0c;需要拐急弯要降低车速。5、换挡时需要将…

建设网站需要的Bootstrap介绍与操作

01-流体布局 流体布局,就是使用百分比来设置元素的宽度,元素的高度值固定, calc()&#xff1a;可以通过计算的方式给元素添加尺寸,比如:width:calc(25%-4px); box-sizing&#xff1a;content-box默认的盒子计算方式 2.border-box 盒子的计算方式从边框开始,盒子的尺寸,边框和…

04.卷积神经网络 W4.特殊应用:人脸识别和神经风格转换

文章目录1. 什么是人脸识别2. One-Shot学习3. Siamese 网络4. Triplet 损失5. 人脸验证与二分类6. 什么是神经风格迁移7. 深度卷积网络在学什么8. Cost function9. Content cost function10. Style cost function11. 一维到三维推广作业参考&#xff1a;吴恩达视频课深度学习笔…