04.卷积神经网络 W3.目标检测(作业:自动驾驶 - 汽车检测)

文章目录

    • 1. 问题背景
    • 2. YOLO 模型
      • 2.1 模型细节
      • 2.2 分类阈值过滤
      • 2.3 非极大值抑制
      • 2.4 完成过滤
    • 3. 在照片上测试已预训练的YOLO模型
      • 3.1 定义类别、anchors、图片尺寸
      • 3.2 加载已预训练的模型
      • 3.3 模型输出转化为可用的边界框变量
      • 3.4 过滤边界框
      • 3.5 在图片上运行

测试题:参考博文

笔记:04.卷积神经网络 W3.目标检测

参考论文:
Redmon et al., 2016 (https://arxiv.org/abs/1506.02640)
Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242)


导入一些包:

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body%matplotlib inline
  • from keras import backend as K, 使用 Keras 的函数可以这么写 K.function(...)

1. 问题背景

在车上装的摄像头采集了汽车道路行驶过程中的照片,所有的照片做了标记,在照片里对每个汽车目标画了方框


因为YOLO模型的训练非常昂贵,我们将加载预先训练好的权重

2. YOLO 模型

YOLO(you only look once)是一种流行的算法,因为它在实现高精度的同时还能够实时运行。

这个算法“只看一次”图像,因为它只需要一次前向传播通过网络来进行预测。
在非最大值抑制之后,它输出识别的对象和边界框。

2.1 模型细节

  • 输入:一批图片,维度:(m, 608, 608, 3)
  • 输出:(pc,bx,by,bh,bw,c)(p_c, b_x, b_y, b_h, b_w, c)(pc,bx,by,bh,bw,c)ccc 可以展开,如果你需要识别80个类别,那么输出就是 85 个数字

我们将使用 5 个 anchor boxes,模型结构如下:
模型结构
如果一个目标的中点在某个方格内,这个方格就负责检测那个目标

展平最后两维
19x19的方格中,每个格子中输出包含 5个 anchor boxes,每个 anchor boxes 包含 对应的标签 85 个数字
预测

可视化预测过程:

  • 对于19x19的网格,找到 5个 box里最大概率的类别
  • 按照概率最大的类别,给目标着色

请注意,这种可视化并不是YOLO算法本身用于进行预测的核心部分;
它只是可视化算法中间结果的一种很好的方式

可视化预测过程
还有一种可视化:

  • 绘制边界框

边界框太多:进行 non max suppression 非最大值抑制

  • 去掉分数低的框
  • 当多个框相互重叠并检测到同一个对象时,只选择一个框

2.2 分类阈值过滤

建立过滤器,去掉任何一个“分数”低于所选阈值的框

模型给你 19x19x5x85 的数字,每个边框包含着 85 个数,把数据拆分下方便后序操作:

  • box_confidence: tensor of shape (19×19,5,1) , 每个格子,5个box预测对象的置信概率
  • boxes: tensor of shape (19×19,5,4),包含每个格子,5个box的 (bx,by,bh,bw)(b_x, b_y, b_h, b_w)(bx,by,bh,bw) 位置信息
  • box_class_probs: tensor of shape (19×19,5,80),包含每个格子,5个box的80种目标的探测概率 (c1,c2,...c80)(c_1, c_2, ... c_{80})(c1,c2,...c80)

boolean_mask 参考:https://www.tensorflow.org/api_docs/python/tf/boolean_mask

tf.boolean_mask( tensor, mask, axis=None, name=‘boolean_mask’ )

# GRADED FUNCTION: yolo_filter_boxesdef yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):"""Filters YOLO boxes by thresholding on object and class confidence.Arguments:box_confidence -- tensor of shape (19, 19, 5, 1)boxes -- tensor of shape (19, 19, 5, 4)box_class_probs -- tensor of shape (19, 19, 5, 80)threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding boxReturns:scores -- tensor of shape (None,), containing the class probability score for selected boxesboxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxesclasses -- tensor of shape (None,), containing the index of the class detected by the selected boxesNote: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. For example, the actual output size of scores would be (10,) if there are 10 boxes."""# Step 1: Compute box scores### START CODE HERE ### (≈ 1 line)box_scores = box_confidence*box_class_probs### END CODE HERE #### Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score### START CODE HERE ### (≈ 2 lines)box_classes = K.argmax(box_scores, axis=-1)box_class_scores = K.max(box_scores, axis=-1)### END CODE HERE #### Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the# same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)### START CODE HERE ### (≈ 1 line)filtering_mask = box_class_scores >= threshold### END CODE HERE #### Step 4: Apply the mask to scores, boxes and classes### START CODE HERE ### (≈ 3 lines)scores = tf.boolean_mask(box_class_scores,filtering_mask)boxes = tf.boolean_mask(boxes, filtering_mask)classes = tf.boolean_mask(box_classes, filtering_mask)### END CODE HERE ###return scores, boxes, classes

2.3 非极大值抑制

过滤以后,还有很多重叠的边界框,这时我们使用 non maximum suppression (NMS)

NMS 使用最高交并比(IoU)的边框作为预测结果

# GRADED FUNCTION: ioudef iou(box1, box2):"""Implement the intersection over union (IoU) between box1 and box2Arguments:box1 -- first box, list object with coordinates (x1, y1, x2, y2)box2 -- second box, list object with coordinates (x1, y1, x2, y2)"""# Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.### START CODE HERE ### (≈ 5 lines)xi1 = np.maximum(box1[0],box2[0])yi1 = np.maximum(box1[1],box2[1])xi2 = np.minimum(box1[2],box2[2])yi2 = np.minimum(box1[3],box2[3])inter_area = (xi2-xi1)*(yi2-yi1)### END CODE HERE ###    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)### START CODE HERE ### (≈ 3 lines)box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])union_area = box1_area + box2_area - inter_area### END CODE HERE #### compute the IoU### START CODE HERE ### (≈ 1 line)iou = inter_area/union_area### END CODE HERE ###return iou

非最大值抑制步骤:

  1. 选出最高分的 box
  2. 计算它与其它的box的重叠,删掉重叠大于阈值的box
  3. 转到 1 继续执行,直到没有box比当前选的box得分低

TF 内置 NMS https://www.tensorflow.org/api_docs/python/tf/image/non_max_suppression

https://www.tensorflow.org/api_docs/python/tf/gather

# GRADED FUNCTION: yolo_non_max_suppressiondef yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):"""Applies Non-max suppression (NMS) to set of boxesArguments:scores -- tensor of shape (None,), output of yolo_filter_boxes()boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)classes -- tensor of shape (None,), output of yolo_filter_boxes()max_boxes -- integer, maximum number of predicted boxes you'd likeiou_threshold -- real value, "intersection over union" threshold used for NMS filteringReturns:scores -- tensor of shape (, None), predicted score for each boxboxes -- tensor of shape (4, None), predicted box coordinatesclasses -- tensor of shape (, None), predicted class for each boxNote: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that thisfunction will transpose the shapes of scores, boxes, classes. This is made for convenience."""max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep### START CODE HERE ### (≈ 1 line)nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)### END CODE HERE #### Use K.gather() to select only nms_indices from scores, boxes and classes### START CODE HERE ### (≈ 3 lines)scores = K.gather(scores, nms_indices)boxes = K.gather(boxes, nms_indices)classes = K.gather(classes, nms_indices)### END CODE HERE ###return scores, boxes, classes

2.4 完成过滤

两个辅助函数:

  • boxes = yolo_boxes_to_corners(box_xy, box_wh) 可以将box转成 两个顶点的表达方式
  • boxes = scale_boxes(boxes, image_shape) 缩放box以便在不同的size的图片上显示
# GRADED FUNCTION: yolo_evaldef yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):"""Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.Arguments:yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:box_confidence: tensor of shape (None, 19, 19, 5, 1)box_xy: tensor of shape (None, 19, 19, 5, 2)box_wh: tensor of shape (None, 19, 19, 5, 2)box_class_probs: tensor of shape (None, 19, 19, 5, 80)image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)max_boxes -- integer, maximum number of predicted boxes you'd likescore_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding boxiou_threshold -- real value, "intersection over union" threshold used for NMS filteringReturns:scores -- tensor of shape (None, ), predicted score for each boxboxes -- tensor of shape (None, 4), predicted box coordinatesclasses -- tensor of shape (None,), predicted class for each box"""### START CODE HERE ### # Retrieve outputs of the YOLO model (≈1 line)box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs# Convert boxes to be ready for filtering functions boxes = yolo_boxes_to_corners(box_xy, box_wh)# Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=score_threshold)# Scale boxes back to original image shape.boxes = scale_boxes(boxes, image_shape)# Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)### END CODE HERE ###return scores, boxes, classes

YOLO 模型总结:

  • 输入 608*608*3 的图片,经过 卷积NN,得到 19*19*5*85的输出
  • 展平最后两维就是 19*19*425,19x19的每个网格包含有 425 个数
  • 5 是因为选了 5 种 anchor boxes, 85 = 80个类别 + 5 个参数 (𝑝𝑐,𝑏𝑥,𝑏𝑦,𝑏ℎ,𝑏𝑤)
  • 然后只选出了一些边框(阈值过滤,非最大值抑制)

3. 在照片上测试已预训练的YOLO模型

  • 创建 session
sess = K.get_session()

3.1 定义类别、anchors、图片尺寸

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
image_shape = (720., 1280.)    

coco_classes文件里定义了80种物体的名称
yolo_anchors文件里有10个浮点数,定义了5种 anchor box 的形状

3.2 加载已预训练的模型

报错:module 'tensorflow' has no attribute 'space_to_depth'

版本问题真的很麻烦,安装以下版本不报错(python 3.7环境)

pip uninstall tensorflow
pip uninstall keras
pip install tensorflow==1.14.0
pip install keras==2.3.1
yolo_model = load_model("model_data/yolo.h5")

模型预览:

yolo_model.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 608, 608, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 608, 608, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 608, 608, 32) 128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 608, 608, 32) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 304, 304, 32) 0           leaky_re_lu_1[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 304, 304, 64) 18432       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 304, 304, 64) 256         conv2d_2[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 304, 304, 64) 0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 152, 152, 64) 0           leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 152, 152, 128 73728       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 152, 152, 128 512         conv2d_3[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 152, 152, 64) 8192        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 152, 152, 64) 256         conv2d_4[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 152, 152, 64) 0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 152, 152, 128 73728       leaky_re_lu_4[0][0]              
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 152, 152, 128 512         conv2d_5[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 76, 76, 128)  0           leaky_re_lu_5[0][0]              
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 76, 76, 256)  294912      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 76, 76, 256)  1024        conv2d_6[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_6[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 76, 76, 128)  32768       leaky_re_lu_6[0][0]              
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 76, 76, 128)  512         conv2d_7[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)       (None, 76, 76, 128)  0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 76, 76, 256)  294912      leaky_re_lu_7[0][0]              
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 76, 76, 256)  1024        conv2d_8[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_8[0][0]      
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 38, 38, 256)  0           leaky_re_lu_8[0][0]              
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 38, 38, 512)  1179648     max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 38, 38, 512)  2048        conv2d_9[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)       (None, 38, 38, 512)  0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_9[0][0]              
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 38, 38, 256)  1024        conv2d_10[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_10[0][0]             
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 38, 38, 512)  2048        conv2d_11[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_11[0][0]             
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 38, 38, 256)  1024        conv2d_12[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_12[0][0]             
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 38, 38, 512)  2048        conv2d_13[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 19, 19, 512)  0           leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 19, 19, 1024) 4718592     max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 19, 19, 1024) 4096        conv2d_14[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_14[0][0]             
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 19, 19, 512)  2048        conv2d_15[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_15[0][0]             
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 19, 19, 1024) 4096        conv2d_16[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_16[0][0]             
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 19, 19, 512)  2048        conv2d_17[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_17[0][0]             
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 19, 19, 1024) 4096        conv2d_18[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_18[0][0]             
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 19, 19, 1024) 4096        conv2d_19[0][0]                  
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 38, 38, 64)   32768       leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 38, 38, 64)   256         conv2d_21[0][0]                  
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_19[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)      (None, 38, 38, 64)   0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 19, 19, 1024) 4096        conv2d_20[0][0]                  
__________________________________________________________________________________________________
space_to_depth_x2 (Lambda)      (None, 19, 19, 256)  0           leaky_re_lu_21[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_20[0][0]     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 19, 19, 1280) 0           space_to_depth_x2[0][0]          leaky_re_lu_20[0][0]             
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 19, 19, 1024) 11796480    concatenate_1[0][0]              
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096        conv2d_22[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_22[0][0]     
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 19, 19, 425)  435625      leaky_re_lu_22[0][0]             
==================================================================================================
Total params: 50,983,561
Trainable params: 50,962,889
Non-trainable params: 20,672

模型把一批图片 m * 608 * 608 * 3 ,转为 tensor m * 19 * 19 * 5 * 85

3.3 模型输出转化为可用的边界框变量

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

3.4 过滤边界框

  • 只选出一些边界框作为结果
scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

3.5 在图片上运行

  1. yolo_model.input is given to yolo_model. The model is used to compute the output yolo_model.output
  2. yolo_model.output is processed by yolo_head. It gives you yolo_outputs
  3. yolo_outputs goes through a filtering function, yolo_eval. It outputs your predictions: scores, boxes, classes
import imageio
def predict(sess, image_file):"""Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions.Arguments:sess -- your tensorflow/Keras session containing the YOLO graphimage_file -- name of an image stored in the "images" folder.Returns:out_scores -- tensor of shape (None, ), scores of the predicted boxesout_boxes -- tensor of shape (None, 4), coordinates of the predicted boxesout_classes -- tensor of shape (None, ), class index of the predicted boxesNote: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. """# Preprocess your imageimage, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))# Run the session with the correct tensors and choose the correct placeholders in the feed_dict.# You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})### START CODE HERE ### (≈ 1 line)out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})### END CODE HERE #### Print predictions infoprint('Found {} boxes for {}'.format(len(out_boxes), image_file))# Generate colors for drawing bounding boxes.colors = generate_colors(class_names)# Draw bounding boxes on the image filedraw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)# Save the predicted bounding box on the imageimage.save(os.path.join("out", image_file), quality=90)# Display the results in the notebookoutput_image = imageio.imread(os.path.join("out", image_file))imshow(output_image)return out_scores, out_boxes, out_classes

注意:当模型使用BatchNorm时(就像在YOLO中一样),需要在 feed_dict 中传递一个额外的 placeholder K.learning_phase(): 0

out_scores, out_boxes, out_classes = predict(sess, "test.jpg")
Found 7 boxes for test.jpg
car 0.60 (925, 285) (1045, 374)
bus 0.67 (5, 267) (220, 407)
car 0.68 (705, 279) (786, 351)
car 0.70 (947, 324) (1280, 704)
car 0.75 (159, 303) (346, 440)
car 0.80 (762, 282) (942, 412)
car 0.89 (366, 299) (745, 648)

Found 2 boxes for 1.jpg
car 0.61 (253, 466) (367, 513)
car 0.73 (179, 473) (284, 522)

  • 批量预测图片,并生成 gif 动图
out_puts_img = []
for id in range(1, 121): # 120 张图片pic_name = str(id)while len(pic_name) < 4:pic_name = '0'+pic_namepic_name = pic_name+'.jpg'out_scores, out_boxes, out_classes, out_put_img = predict(sess, pic_name) # 更改函数,多加一个输出out_puts_img.append(out_put_img)def create_gif(image_list, gif_name, duration=0.3):frames = []for img in image_list:frames.append(img)imageio.mimsave(gif_name, frames, 'GIF', duration=duration)create_gif(out_puts_img, 'out.gif', 0.5)

动图展示预测结果


我的CSDN博客地址 https://michael.blog.csdn.net/

长按或扫码关注我的公众号(Michael阿明),一起加油、一起学习进步!
Michael阿明

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/473954.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LeetCode 775. 全局倒置与局部倒置(归并排序/二分查找/一次遍历)

文章目录1. 题目2. 解题2.1 归并排序求逆序度2.2 二分查找2.3 一次遍历1. 题目 数组 A 是 [0, 1, ..., N - 1] 的一种排列&#xff0c;N 是数组 A 的长度。 全局倒置指的是 i,j 满足 0 < i < j < N 并且 A[i] > A[j] &#xff0c;局部倒置指的是 i 满足 0 < i…

使用 ServiceStack 构建跨平台 Web 服务

本文主要来自MSDN杂志《Building Cross-Platform Web Services with ServiceStack》&#xff0c;Windows Communication Foundation (WCF) 是一个相当优秀的服务框架&#xff0c;当我们讨论跨平台的服务的时候&#xff0c;虽然WCF对WebService的支持还行&#xff0c;在面对一些…

Hive基础(一)

一、Hive是什么 Hive是基于Hadoop的一个数据仓库工具(离线)&#xff0c;可以将结构化的数据文件映射为一张数据库表&#xff0c;并提供类SQL查询功能。&#xff0c;它能接收用户输入的sql语句&#xff0c;然后把它翻译成mapreduce程序对HDFS上的数据进行查询、运算&#xff0c;…

LeetCode 926. 将字符串翻转到单调递增(动态规划)

文章目录1. 题目2. 解题1. 题目 如果一个由 0 和 1 组成的字符串&#xff0c;是以一些 0&#xff08;可能没有 0&#xff09;后面跟着一些 1&#xff08;也可能没有 1&#xff09;的形式组成的&#xff0c;那么该字符串是单调递增的。 我们给出一个由字符 0 和 1 组成的字符串…

利用Jqurey写一个输入内容增加并且可以删除,上下移动的标签

最终结果如下&#xff0c;输入内容增加标签并且可以删除&#xff0c;上下移动&#xff1a; 代码赏析&#xff1a; <!DOCTYPE html><html lang"en"><head><meta charset"UTF-8"><title>todolist</title><style type…

java的注释、关键字、标识符、变量常量、数据类型、运算符、流程控制等

java的注释、关键字、标识符、变量常量、 数据类型、运算符、流程控制等 1. java概述 1.1 java的技术体系 Java SE&#xff1a;是Java的标准版&#xff0c;提供了完整的java核心API。 Java EE&#xff1a;是Java的企业版&#xff0c;主要用于开发…

LeetCode 851. 喧闹和富有(拓扑排序)

文章目录1. 题目2. 解题1. 题目 在一组 N 个人&#xff08;编号为 0, 1, 2, ..., N-1&#xff09;中&#xff0c;每个人都有不同数目的钱&#xff0c;以及不同程度的安静&#xff08;quietness&#xff09;。 为了方便起见&#xff0c;我们将编号为 x 的人简称为 "perso…

LeetCode 981. 基于时间的键值存储(哈希+二分查找)

文章目录1. 题目2. 解题1. 题目 创建一个基于时间的键值存储类 TimeMap&#xff0c;它支持下面两个操作&#xff1a; set(string key, string value, int timestamp) 存储键 key、值 value&#xff0c;以及给定的时间戳 timestamp。 get(string key, int timestamp) 返回先…

java的类与对象

java的类与对象 1 面向对象 1.1 面向对象与面向过程的区别 面向过程和面向对象都是解决问题的逻辑方法&#xff0c;面向过程是强调解决问题的步骤&#xff0c;可以先定义多个函数&#xff0c;在使用的使用调用函数即可&#xff1b;面向对象把问题分解成多个对象&#xff0c;…

04.卷积神经网络 W4.特殊应用:人脸识别和神经风格转换

文章目录1. 什么是人脸识别2. One-Shot学习3. Siamese 网络4. Triplet 损失5. 人脸验证与二分类6. 什么是神经风格迁移7. 深度卷积网络在学什么8. Cost function9. Content cost function10. Style cost function11. 一维到三维推广作业参考&#xff1a;吴恩达视频课深度学习笔…

java的封装,继承,多态

java的封装&#xff0c;继承&#xff0c;多态 1 封装 1.1 封装 指一种将抽象性函式接口的实现细节部份包装、隐藏起来的方法。封装可以被认为是一个保护屏障&#xff0c;防止该类的代码和数据被外部类定义的代码随机访问。要访问该类的代码和数据&#xff0c;必须通过严格的…

详解一个自己原创的正则匹配IP的表达式

第一种方法&#xff08;可以匹配有点毛病&#xff09;&#xff1a;检测IP地址的正则表达式 正则表达式&#xff1a; ((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?) ((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?) 红色块代…

天池在线编程 2020年9月26日 日常周赛题解

文章目录1. K步编辑2. 折纸3. 字符串的不同排列4. 硬币排成线题目地址&#xff0c;请点这 1. K步编辑 给出一个只含有小写字母的字符串的集合以及一个目标串(target)&#xff0c;输出所有可以经过不多于 k 次操作得到目标字符串的字符串。 你可以对字符串进行一下的3种操作:…

使用parted划分GPT分区(转)

parted命令可以划分单个分区大于2T的GPT格式的分区&#xff0c;也可以划分普通的MBR分区&#xff0c;fdisk命令对于大于2T的分区无法划分&#xff0c;所以用fdisk无法看到parted划分的GPT格式的分区。 Parted 命令分为两种模式&#xff1a;命令行模式和交互模式。 1、命令行模式…

04.卷积神经网络 W4.特殊应用:人脸识别和神经风格转换(作业:快乐屋人脸识别+图片风格转换)

文章目录作业1&#xff1a;快乐房子 - 人脸识别0. 朴素人脸验证1. 编码人脸图片1.1 使用卷积网络编码1.2 Triplet 损失2. 加载训练过的模型3. 使用模型3.1 人脸验证3.2 人脸识别作业2&#xff1a;神经风格转换1. 问题背景2. 迁移学习3. 神经风格转换3.1 计算内容损失3.2 计算风…

Maven3路程(三)用Maven创建第一个web项目(1)

一.创建项目 1.Eclipse中用Maven创建项目 上图中Next 2.继续Next 3.选maven-archetype-webapp后&#xff0c;next 4.填写相应的信息&#xff0c;Packaged是默认创建一个包&#xff0c;不写也可以 5.创建好项目后&#xff0c;目录如下&#xff1a; 至此&#xff0c;项目已经创建…

数组排序(冒泡、排序)

目前对于数组的排序有主要的两种&#xff0c;一种是选择排序&#xff0c;一种是冒泡排序。当然大学学过数据结构的知道&#xff0c;还有一些其他的排序&#xff0c;这里就不说明了&#xff0c;有时间自己上网查查。其实排序在开发中并不常用&#xff0c; 我们学习它是学一种思想…

LeetCode 1598. 文件夹操作日志搜集器

文章目录1. 题目2. 解题1. 题目 每当用户执行变更文件夹操作时&#xff0c;LeetCode 文件系统都会保存一条日志记录。 下面给出对变更操作的说明&#xff1a; "../" &#xff1a;移动到当前文件夹的父文件夹。如果已经在主文件夹下&#xff0c;则 继续停留在当前文…

Java集合Set,List和Map等

Java集合Set,List和Map等 1 Java集合框架 因为Java是面向对象的语言&#xff0c;对事物的体现都是以对象的形式&#xff0c;为了方便对多个对象的操作&#xff0c;就要对对象进行存储。另一方面&#xff0c;使用Array存储对象方面具有一些弊端 。Java集合就像一个容器&#x…

利用bootstrap框架做了一个采摘节节日活动网页

效果欣赏如下&#xff1a; 总共主要是一下两个块代码&#xff1a; 1.主题&#xff1a; <!DOCTYPE html><html lang"en"><head><meta charset"UTF-8"><meta name"viewport" content"width, initial-scale1.0&qu…