华为 Ascend 平台 YOLOv5 目标检测推理教程

1. 背景介绍

随着人工智能技术的快速发展，目标检测在智能安防、自动驾驶、工业检测等领域中扮演了重要角色。YOLOv5 是一种高效的目标检测模型，凭借其速度和精度的平衡广受欢迎。

华为 Ascend 推理框架（ACL）是 Ascend CANN 软件栈的核心组件，专为 Ascend AI 加速硬件（如 Atlas 300I）设计，可实现高性能的深度学习推理。在本文中，我们将介绍如何基于华为 Ascend ACL 推理框架对 YOLOv5 模型进行推理，包括前处理、推理核心部分以及后处理流程。

2. YOLOv5 推理流程

YOLOv5 推理流程分为三个阶段：

前处理：将输入图片调整为模型要求的格式。
推理：调用 Ascend ACL 推理框架完成推理。
后处理：对推理结果进行解析，提取检测框。

接下来，我们将逐步解析这些流程，并结合核心代码进行说明。

2.1 前处理

YOLOv5 模型需要输入固定尺寸的图片（例如 1280x1280）。为了适配输入要求，我们采用 letterbox 方法对图片进行缩放和填充，以保持图像比例不变，并将其转换为 NumPy 数组。
代码示例：

import numpy as np
from typing import Tuple
from yolov5_utils import letterboxdef preprocess(self, img: np.ndarray) -> Tuple[np.ndarray, tuple, float, float]:# 调整图像大小并填充img, ratio, (pad_w, pad_h) = letterbox(img, (self.model_width, self.model_height))img = np.ascontiguousarray(img, dtype=np.uint8)tensor = np.expand_dims(img, axis=0)  # 添加 batch 维度return tensor

关键点：

letterbox：对图像进行缩放和填充，确保输入尺寸与模型要求一致，同时不改变图像的宽高比例。
np.expand_dims：扩展图像维度，增加批量维度（[batch, channels, height, width]）。

2.2 推理核心（process 方法）

推理核心是整个检测流程的关键部分，包括以下几个步骤：

设置推理上下文；
将输入数据从主机内存传输到设备内存（NPU）；
调用推理引擎执行推理；
将推理结果从设备内存传回主机内存；
返回推理结果。

完整代码：

import acl
import aclruntime
import numpy as np
from typing import Listdef process(self, tensor: np.ndarray) -> List[np.ndarray]:# 1. 设置推理上下文ret = acl.rt.set_context(self.context)# 2. 将输入数据封装为 Tensor，并传输到设备tensor = aclruntime.Tensor(tensor)tensor.to_device(self.device)# 3. 执行推理output_tensors = self.session.run([node.name for node in self.session.get_outputs()], [tensor])# 4. 将推理结果从设备传回主机preds = []for output in output_tensors:output.to_host()preds.append(np.array(output))return preds

关键点解析：

上下文设置：
- 使用 acl.rt.set_context(self.context) 将推理会话绑定到 Ascend NPU 的计算上下文。
- 如果上下文未正确设置，推理调用会失败。
设备内存传输：
- 使用 aclruntime.Tensor 封装 NumPy 输入数据。
- 调用 to_device(self.device) 将数据加载到指定设备（NPU）。
推理执行：
- 调用 self.session.run 执行推理。
- 输入参数为模型的输出节点名称和输入数据。
结果传回主机：
- 使用 to_host() 将推理结果从设备内存传回主机内存。
- 转换为 NumPy 数组便于后续处理。

2.3 后处理

YOLOv5 的推理结果通常是一个多维张量，包含每个候选框的边界框坐标、置信度和类别信息。为了得到最终检测框，需要进行以下处理：

非极大值抑制（NMS）：过滤重叠框，保留最佳检测结果。
坐标映射：将推理结果的坐标映射回原图尺寸。
代码示例：

from yolov5_utils import non_max_suppression, scale_coordsdef postprocess(self, preds, img, tensor):# 后处理：NMS 和坐标映射boxes = non_max_suppression(preds[0], conf_thres=self.conf_threshold, iou_thres=self.iou_threshold)[0]if boxes.size > 0:boxes[:, :4] = scale_coords(tensor.shape[1:3], boxes[:, :4], img.shape).round()return boxes

关键点：

non_max_suppression：过滤重叠框，避免多个框重复检测同一目标。
scale_coords：将推理结果的坐标从模型输入尺寸映射回原图尺寸。

2.4 检测流程整合

以下是完整的检测流程代码：

import cv2
from typing import List, Tupledef run(self, img_path: str) -> List[Tuple]:# 1. 加载输入图片img = cv2.imread(img_path)# 2. 前处理tensor = self.preprocess(img)# 3. 模型推理preds = self.process(tensor)# 4. 后处理boxes = self.postprocess(preds, img, tensor)# 返回最终检测结果return [(int(box[0]), int(box[1]), int(box[2]), int(box[3]), round(box[4], 2), self.class_names[int(box[5])])for box in boxes]

2.5 调用示例

运行以下代码对图片进行目标检测：

if __name__ == "__main__":model_path = "yolov5_model.om"class_names = ["person", "car", "bicycle", "dog", "cat", ...]  # 替换为你的类别名称detector = YOLOv5Detector(model_path=model_path, class_names=class_names)image_path = "example.jpg"boxes = detector.run(image_path)# 绘制结果并保存img = cv2.imread(image_path)for box in boxes:x1, y1, x2, y2, conf, cls_name = boxcv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)cv2.putText(img, f'{cls_name} {conf:.2f}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)cv2.imwrite("detected_result.jpg", img)