YOLOv8目标检测——详细记录使用ONNX Runtime进行推理部署C++/Python实现

概述

在之前博客中有介绍YOLOv8从环境安装到训练的完整过程，本节主要介绍ONNX Runtime的原理以及使用其进行推理加速，使用Python、C++两种编程语言来实现。
https://blog.csdn.net/MariLN/article/details/143924548?spm=1001.2014.3001.5501

1. ONNX Runtime

ONNX Runtime是一个由微软推出的跨平台机器学习模型加速器，它仅支持 ONNX 模型格式。它适用于桌面、服务器以及移动设备。

多框架支持：支持多种常见的深度学习框架，如 PyTorch、TensorFlow、Keras、scikit-learn 等，使开发者能轻松将不同框架训练的模型移植到 ONNX Runtime 中进行高效推理，促进了模型在不同框架间的共享与流转。
跨平台兼容性：可在 Linux、Windows、macOS 等多种操作系统上运行，还支持在云、边缘、网页和移动等不同环境中部署，能很好地满足各种应用场景的需求。
硬件优化：针对 GPU、CPU 以及各种 AI 加速器（如 Intel MKL、cuDNN、TensorRT 等）进行了优化，能够充分利用硬件资源提升性能。例如，在 GPU 上可实现并行计算，大大加快模型的推理速度。
高效的内存管理：采用零拷贝（Zero-Copy）技术和内存池管理，减少了数据传输的开销，提升了整体运行速度，在处理大规模数据时优势明显。
动态形状支持：允许输入尺寸在运行时发生变化，模型仍能正确处理，增加了模型应用的灵活性，可更好地适应不同的输入数据情况。

2. 模型转换

2.1 .pt与.onnx模型

2.1.1 pt 模型

.pt 模型是 PyTorch 模型的一种常见存储格式。PyTorch 是一个广泛使用的深度学习框架，在训练神经网络模型时，模型的参数（包括权重和偏置等）会被保存下来，这些参数可以以.pt 文件的形式存储在磁盘中。例如，当你使用 PyTorch 训练一个图像分类模型（如 ResNet）后，通过torch.save()函数就可以将训练好的模型保存为.pt 文件。

本质上它是一个二进制文件，它包含了模型的结构定义和参数。模型的结构定义包括网络的层数、每层的类型（如线性层、卷积层、池化层等）、激活函数的类型等信息。参数则是在训练过程中学习到的具体数值，这些数值决定了模型对输入数据的处理方式。

2.1.2 onnx 模型

ONNX（Open Neural Network Exchange）是一种开放的神经网络交换格式，.onnx 文件就是以这种格式存储的模型文件。它的出现是为了解决不同深度学习框架之间模型转换和互用的问题。许多深度学习框架（如 PyTorch、TensorFlow 等）都可以将自己的模型转换为 ONNX 格式。以 PyTorch 为例，通过torch.onnx.export()函数可以将.pt 模型转换为.onnx 模型。

.onnx 文件同样是一种结构化的文件，它以一种中间表示的形式存储了模型的计算图。这个计算图包含了模型中的各种操作（如加法、乘法、卷积等）以及操作之间的连接关系，同时也包含了模型的输入和输出信息。这种中间表示形式使得不同框架训练的模型能够在一个统一的格式下进行转换和推理。

.onnx 模型主要用于模型的跨框架部署和推理。由于它可以被多种推理引擎（如 ONNX Runtime、TensorRT 等）所支持，所以可以将在一个框架下训练好的模型转换为.onnx 格式，然后在其他环境中进行高效的推理。例如，在工业生产环境中，模型可能是在 PyTorch 中训练的，但在实际的产品线上，需要将其部署到一个对性能和效率要求更高的推理引擎上，此时将模型转换为.onnx 文件并使用 ONNX Runtime 等推理引擎进行部署就非常方便。同时，它也方便了不同团队之间的协作，即使不同团队使用不同的深度学习框架，也可以通过.onnx 文件进行模型的共享和集成。

2.2 .pt转换.onnx

将训练好的 YOLOv8 的.pt模型转换为.onnx模型。可以使用ultralytics库来进行转换。

yolo task=detect mode=export model=./runs/detect/train/weights/best.pt format=onnx

或

from ultralytics import YOLO# Load a model
model = YOLO('./runs/detect/train/weights/best.pt')  # load a custom trained# Export the model
success = model.export(format='onnx')

3. 模型推理

3.1 Python实现

3.1.1 环境部署

需要安装onnxruntime、numpy、cv2等库。如果使用 GPU 进行推理，还需安装onnxruntime-gpu。

pip install onnxruntime
pip install onnxruntime-gpu
pip install opencv-python
pip install numpy
pip install gradio

3.1.2 推理步骤

（1）图像预处理

读取图像并将图像的颜色空间从 BGR 格式转换为 RGB 格式。OpenCV 默认使用 BGR 格式，而许多深度学习框架和模型（如 ONNX 模型）则期望输入是 RGB 格式。
调整图像大小，通常将图像 resize 到模型要求的输入尺寸，如 640x640。
对图像进行归一化处理，将像素值归一化到 [0, 1] 区间。
调整图像通道顺序，一般从 HWC（Height, Width, Channel）转换为 CHW 格式，并增加一个批次维度，使其变为 NCHW 格式，N 为批次大小，通常设为 1。

import cv2
import numpy as npdef prepare_input(image, input_width, input_height):# 转换为 RGB 格式input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)# cv2.imread 读取到的图像默认是 BGR 格式的# 调整图像尺寸input_img = cv2.resize(input_img, (input_width, input_height))# input_width、input_height是模型期望的输入宽度和高度# 归一化到 0-1input_img = input_img / 255.0# 变换通道顺序，并增加 batch 维度，HWC->NCHWinput_img = input_img.transpose(2, 0, 1)input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)# np.newaxis 用于增加一个新的维度return input_tensorimage_path = "test.jpg"
image = cv2.imread(image_path)
input_tensor= prepare_input(image, 640, 640)

（2）模型推理

创建onnxruntime.InferenceSession对象，加载转换后的.onnx模型。
将预处理后的图像数据作为输入，传递给模型进行推理，并获取输出结果。

def inference(model_path, input_tensor):start = time.perf_counter() # 获取一个高精度的时间戳，主要用于代码性能测试和计算时间间隔，精确度通常远高于 time.time()# 加载 ONNX 模型session = onnxruntime.InferenceSession(model_path, providers=onnxruntime.get_available_providers())# 获取输入和输出的名字input_names = [model_inputs.name for model_inputs in session.get_inputs()]output_names = [model_outputs.name for model_outputs in session.get_outputs()]# 运行模型推理outputs = session.run(output_names, {input_names[0]: input_tensor})print(f"Inference time: {(time.perf_counter() - start)*1000:.2f} ms")return outputs

（3）后处理

对模型的输出结果去除批量维度。
获取每个检测框的置信度最高的类别，并根据置信度阈值进行筛选，过滤掉低置信度的目标检测框。
坐标转换，将预测框还原到原始图像尺寸，并将边界框的表示从中心点坐标 (x_center, y_center) 和宽高 (w, h) 格式转换为左上角和右下角坐标 (x1, y1, x2, y2) 格式。
进行非极大值抑制（NMS），去除重叠度过高的检测框，得到最终的目标检测结果。

def xywh2xyxy(x):# 将边界框从 (x_center, y_center, w, h) 格式转换为 (x1, y1, x2, y2)y = np.copy(x)# 计算左上角坐标 x1 和 y1y[..., 0] = x[..., 0] - x[..., 2] / 2  # x1 = x_center - w / 2y[..., 1] = x[..., 1] - x[..., 3] / 2  # y1 = y_center - h / 2# 计算右下角坐标 x2 和 y2y[..., 2] = x[..., 0] + x[..., 2] / 2  # x2 = x_center + w / 2y[..., 3] = x[..., 1] + x[..., 3] / 2  # y2 = y_center + h / 2return ydef multiclass_nms(boxes, scores, class_ids, iou_threshold):# 获取所有唯一的类别索引unique_class_ids = np.unique(class_ids)keep_boxes = []  # 存储最终保留的边界框索引for class_id in unique_class_ids:# 筛选出属于当前类别的边界框索引class_indices = np.where(class_ids == class_id)[0] # np.where返回元组# 提取属于当前类别的边界框和分数class_boxes = boxes[class_indices, :]   # 当前类别的边界框class_scores = scores[class_indices]   # 当前类别的分数# 执行 NMS 并获取保留下来的索引class_keep_boxes = nms(class_boxes, class_scores, iou_threshold)# 将保留的索引（对应原始的索引）添加到结果中keep_boxes.extend(class_indices[class_keep_boxes])return keep_boxesdef nms(boxes, scores, iou_threshold):# 根据 scores 对检测框从高到低进行排序，得到排序后的索引sorted_indices = np.argsort(scores)[::-1] # [::-1] 反转排序顺序keep_boxes = []while sorted_indices.size > 0:# 保留最高分数的边界框box_id = sorted_indices[0]keep_boxes.append(box_id)# 计算当前最高分数的边界框与剩余边界框的 IoUious = compute_iou(boxes[box_id, :], boxes[sorted_indices[1:], :])# 找出 IoU 小于阈值的边界框索引，保留这些框，过滤重叠框keep_indices = np.where(ious < iou_threshold)[0]# 注意：由于 keep_indices 是相对于 sorted_indices[1:] 的索引，# 需要将其整体偏移 +1 来匹配到原始 sorted_indicessorted_indices = sorted_indices[keep_indices + 1]return keep_boxesdef compute_iou(box, boxes):# 计算交集区域的坐标，xmin 和 ymin: 交集左上角的坐标，xmax 和 ymax: 交集右下角的坐标xmin = np.maximum(box[0], boxes[:, 0]) ymin = np.maximum(box[1], boxes[:, 1]) xmax = np.minimum(box[2], boxes[:, 2]) ymax = np.minimum(box[3], boxes[:, 3])  # 计算交集区域面积，如果两个框没有重叠，交集宽度和高度会为负，使用 np.maximum 保证面积非负intersection_area = np.maximum(0, xmax - xmin) * np.maximum(0, ymax - ymin)# 计算每个边界框的面积box_area = (box[2] - box[0]) * (box[3] - box[1])  boxes_area = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])  # 计算并集区域面积union_area = box_area + boxes_area - intersection_area# 计算 IoU（交并比）iou = intersection_area / union_area  # 交集区域面积 / 并集区域面积return ioudef process_output(outputs, conf_threshold, iou_threshold, input_width, input_height, img_width, img_height):predictions = np.squeeze(outputs[0]).T # 去除数组中形状为1的维度，批量维度，(1, N, M)->(M, N)# 获取每个检测框的置信度最高的类别scores = np.max(predictions[:, 4:], axis=1) # 在行方向上取最大值# 根据置信度阈值过滤掉低置信度的检测框predictions = predictions[scores > conf_threshold, :]scores = scores[scores > conf_threshold]if len(scores) == 0:return [], [], []# 获取检测框的类别置信度最高的索引class_ids = np.argmax(predictions[:, 4:], axis=1) # 返回数组中最大值的索引# 提取边界框boxes = predictions[:, :4]# 将边界框坐标从归一化坐标还原到原图尺寸input_shape = np.array([input_width, input_height, input_width, input_height])boxes = np.divide(boxes, input_shape, dtype=np.float32) # 边界框坐标是相对于输入图像尺寸的，归一化到 [0, 1] 之间boxes *= np.array([img_width, img_height, img_width, img_height]) # 将归一化的坐标还原到原图尺寸# 转换为 xyxy 格式boxes = xywh2xyxy(boxes)# 执行非极大值抑制（NMS）indices = multiclass_nms(boxes, scores, class_ids, iou_threshold)return boxes[indices], scores[indices], class_ids[indices]

3.1.3 完整代码部署

utils.py

import numpy as np
import cv2class_names = ['person','head','helmet']# Create a list of colors for each class where each color is a tuple of 3 integer values
rng = np.random.default_rng(3)
colors = rng.uniform(0, 255, size=(len(class_names), 3))def nms(boxes, scores, iou_threshold):# Sort by scoresorted_indices = np.argsort(scores)[::-1]keep_boxes = []while sorted_indices.size > 0:# Pick the last boxbox_id = sorted_indices[0]keep_boxes.append(box_id)# Compute IoU of the picked box with the restious = compute_iou(boxes[box_id, :], boxes[sorted_indices[1:], :])# Remove boxes with IoU over the thresholdkeep_indices = np.where(ious < iou_threshold)[0]# print(keep_indices.shape, sorted_indices.shape)sorted_indices = sorted_indices[keep_indices + 1]return keep_boxesdef multiclass_nms(boxes, scores, class_ids, iou_threshold):unique_class_ids = np.unique(class_ids)keep_boxes = []for class_id in unique_class_ids:class_indices = np.where(class_ids == class_id)[0]class_boxes = boxes[class_indices,:]class_scores = scores[class_indices]class_keep_boxes = nms(class_boxes, class_scores, iou_threshold)keep_boxes.extend(class_indices[class_keep_boxes])return keep_boxesdef compute_iou(box, boxes):# Compute xmin, ymin, xmax, ymax for both boxesxmin = np.maximum(box[0], boxes[:, 0])ymin = np.maximum(box[1], boxes[:, 1])xmax = np.minimum(box[2], boxes[:, 2])ymax = np.minimum(box[3], boxes[:, 3])# Compute intersection areaintersection_area = np.maximum(0, xmax - xmin) * np.maximum(0, ymax - ymin)# Compute union areabox_area = (box[2] - box[0]) * (box[3] - box[1])boxes_area = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])union_area = box_area + boxes_area - intersection_area# Compute IoUiou = intersection_area / union_areareturn ioudef xywh2xyxy(x):# Convert bounding box (x, y, w, h) to bounding box (x1, y1, x2, y2)y = np.copy(x)y[..., 0] = x[..., 0] - x[..., 2] / 2y[..., 1] = x[..., 1] - x[..., 3] / 2y[..., 2] = x[..., 0] + x[..., 2] / 2y[..., 3] = x[..., 1] + x[..., 3] / 2return ydef draw_detections(image, boxes, scores, class_ids, mask_alpha=0.3):det_img = image.copy()img_height, img_width = image.shape[:2]font_size = min([img_height, img_width]) * 0.0006text_thickness = int(min([img_height, img_width]) * 0.001)det_img = draw_masks(det_img, boxes, class_ids, mask_alpha)# Draw bounding boxes and labels of detectionsfor class_id, box, score in zip(class_ids, boxes, scores):color = colors[class_id]draw_box(det_img, box, color)label = class_names[class_id]caption = f'{label} {int(score * 100)}%'draw_text(det_img, caption, box, color, font_size, text_thickness)return det_imgdef draw_box( image: np.ndarray, box: np.ndarray, color: tuple[int, int, int] = (0, 0, 255),thickness: int = 2) -> np.ndarray:x1, y1, x2, y2 = box.astype(int)return cv2.rectangle(image, (x1, y1), (x2, y2), color, thickness)def draw_text(image: np.ndarray, text: str, box: np.ndarray, color: tuple[int, int, int] = (0, 0, 255),font_size: float = 0.001, text_thickness: int = 2) -> np.ndarray:x1, y1, x2, y2 = box.astype(int)(tw, th), _ = cv2.getTextSize(text=text, fontFace=cv2.FONT_HERSHEY_SIMPLEX,fontScale=font_size, thickness=text_thickness)th = int(th * 1.2)cv2.rectangle(image, (x1, y1),(x1 + tw, y1 - th), color, -1)return cv2.putText(image, text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, font_size, (255, 255, 255), text_thickness, cv2.LINE_AA)def draw_masks(image: np.ndarray, boxes: np.ndarray, classes: np.ndarray, mask_alpha: float = 0.3) -> np.ndarray:mask_img = image.copy()# Draw bounding boxes and labels of detectionsfor box, class_id in zip(boxes, classes):color = colors[class_id]x1, y1, x2, y2 = box.astype(int)# Draw fill rectangle in mask imagecv2.rectangle(mask_img, (x1, y1), (x2, y2), color, -1)return cv2.addWeighted(mask_img, mask_alpha, image, 1 - mask_alpha, 0)

target_detection.py

import time
import cv2
import numpy as np
import onnxruntimefrom detection.utils import xywh2xyxy, draw_detections, multiclass_nmsclass TargetDetection:def __init__(self, path, conf_thres=0.7, iou_thres=0.5):self.conf_threshold = conf_thresself.iou_threshold = iou_thres# Initialize modelself.initialize_model(path)def __call__(self, image):return self.detect_objects(image)def initialize_model(self, path):self.session = onnxruntime.InferenceSession(path,providers=onnxruntime.get_available_providers())# Get model infoself.get_input_details()self.get_output_details()def detect_objects(self, image):input_tensor = self.prepare_input(image)# Perform inference on the imageoutputs = self.inference(input_tensor)self.boxes, self.scores, self.class_ids = self.process_output(outputs)return self.boxes, self.scores, self.class_idsdef prepare_input(self, image):self.img_height, self.img_width = image.shape[:2]input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)# Resize input imageinput_img = cv2.resize(input_img, (self.input_width, self.input_height))# Scale input pixel values to 0 to 1input_img = input_img / 255.0input_img = input_img.transpose(2, 0, 1)input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)return input_tensordef inference(self, input_tensor):start = time.perf_counter()outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})# print(f"Inference time: {(time.perf_counter() - start)*1000:.2f} ms")return outputsdef process_output(self, output):predictions = np.squeeze(output[0]).T# Filter out object confidence scores below thresholdscores = np.max(predictions[:, 4:], axis=1)predictions = predictions[scores > self.conf_threshold, :]scores = scores[scores > self.conf_threshold]if len(scores) == 0:return [], [], []# Get the class with the highest confidenceclass_ids = np.argmax(predictions[:, 4:], axis=1)# Get bounding boxes for each objectboxes = self.extract_boxes(predictions)# Apply non-maxima suppression to suppress weak, overlapping bounding boxes# indices = nms(boxes, scores, self.iou_threshold)indices = multiclass_nms(boxes, scores, class_ids, self.iou_threshold)return boxes[indices], scores[indices], class_ids[indices]def extract_boxes(self, predictions):# Extract boxes from predictionsboxes = predictions[:, :4]# Scale boxes to original image dimensionsboxes = self.rescale_boxes(boxes)# Convert boxes to xyxy formatboxes = xywh2xyxy(boxes)return boxesdef rescale_boxes(self, boxes):# Rescale boxes to original image dimensionsinput_shape = np.array([self.input_width, self.input_height, self.input_width, self.input_height])boxes = np.divide(boxes, input_shape, dtype=np.float32)boxes *= np.array([self.img_width, self.img_height, self.img_width, self.img_height])return boxesdef draw_detections(self, image, draw_scores=True, mask_alpha=0.4):return draw_detections(image, self.boxes, self.scores,self.class_ids, mask_alpha)def get_input_details(self):model_inputs = self.session.get_inputs()self.input_names = [model_inputs[i].name for i in range(len(model_inputs))]self.input_shape = model_inputs[0].shapeself.input_height = self.input_shape[2]self.input_width = self.input_shape[3]print(self.input_width,self.input_height)def get_output_details(self):model_outputs = self.session.get_outputs()self.output_names = [model_outputs[i].name for i in range(len(model_outputs))]

ATDetector.py

import cv2
from detection.target_detection import TargetDetection
from detection.utils import draw_detections# yolov8 onnx 模型推理
class ATDetector():def __init__(self):super(ATDetector, self).__init__()self.model_path = "../yolov8s_best.onnx"self.detector = TargetDetection(self.model_path, conf_thres=0.5, iou_thres=0.3)def detect_image(self, input_image, output_image):cv_img = cv2.imread(input_image)boxes, scores, class_ids = self.detector.detect_objects(cv_img)cv_img = draw_detections(cv_img, boxes, scores, class_ids)cv2.namedWindow("output", cv2.WINDOW_NORMAL)cv2.imwrite(output_image, cv_img)cv2.imshow('output', cv_img)cv2.waitKey(0)def detect_video(self, input_video, output_video):cap = cv2.VideoCapture(input_video)fps = int(cap.get(5))videoWriter = Nonewhile True:_, cv_img = cap.read()if cv_img is None:breakboxes, scores, class_ids = self.detector.detect_objects(cv_img)cv_img = draw_detections(cv_img, boxes, scores, class_ids)# 如果视频写入器未初始化，则使用输出视频路径和参数进行初始化if videoWriter is None:fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')# 在这里给值了，它就不是None, 下次判断它就不进这里了videoWriter = cv2.VideoWriter(output_video, fourcc, fps, (cv_img.shape[1], cv_img.shape[0]))videoWriter.write(cv_img)cv2.imshow("aod", cv_img)cv2.waitKey(5)# 等待按键并检查窗口是否关闭if cv2.getWindowProperty("aod", cv2.WND_PROP_AUTOSIZE) < 1:# 点x退出breakcap.release()videoWriter.release()cv2.destroyAllWindows()if __name__ == '__main__':det = ATDetector()# input_image = "../data/A_905.jpg"# output_image = '../data/output.jpg'# det.detect_image(input_image, output_image)input_video=r"E:\dataset\MOT\video\A13.mp4"output_video="../data/output.mp4"det.detect_video(input_video,output_video)

3.2 C++实现

3.2.1 为什么呢？？？

Python 是解释型语言，代码在运行时逐行解释执行。在进行模型推理时，每次执行模型的计算操作（如卷积、池化等）都需要解释器介入，这会带来一定的性能开销。而C++ 是编译型语言，代码直接编译为机器码，计算机可以直接执行。在处理 YOLOv8 推理等这种计算密集型任务时，C++ 没有解释器的开销，执行速度更快。
Python 代码的跨平台性较好，但在一些特殊的硬件平台或者嵌入式系统中，可能会受到限制。例如，在资源非常有限的嵌入式设备中，安装 Python 解释器以及相关的依赖库（如 NumPy、ONNX Runtime for Python 等）可能会占用过多的存储空间，并且 Python 解释器的运行也需要一定的资源支持。而且 Python 程序在不同的操作系统上可能会因为依赖库版本等问题出现兼容性问题。C++ 的跨平台性非常出色，并且可以通过编译器选项和特定的平台相关代码，更好地适应不同的硬件环境。对于 YOLOv8 等模型推理，如果要部署到嵌入式设备、工业控制设备等特殊平台，C++ 可以更方便地进行优化和定制。例如，在一些对性能和体积要求苛刻的嵌入式视觉系统中，C++ 可以直接编译成高效的机器码，并且可以根据设备的硬件特性进行针对性的优化，如利用硬件加速指令集等。
总之，使用C++ 编写可以提供更快的实时性能。

3.2.2 安装依赖库

（1）下载ONNX Runtime

笔者的环境是Windows11，CUDA 11.7，cuDNN 8.5，IDE是 vs2019。下载的ONNX Runtime的CPU和GPU版本为1.14.1。下载链接为https://github.com/microsoft/onnxruntime/releases/tag/v1.14.1

在这里插入图片描述

（2）下载OpenCV

笔者下载的opencv 版本为 4.7.0 ，下载链接为 https://opencv.org/releases/

在这里插入图片描述

（3）配置ONNX Runtime和OpenCV

下载完成后解压，在项目属性配置ONNX Runtime和OpenCV。
首先：把ONNX Runtime和OpenCV加入到包含目录，路径里面包含ONNX Runtime和OpenCV的头文件。
在这里插入图片描述
接着：把ONNX Runtime和OpenCV加入到库目录，路径里面包含ONNX Runtime和OpenCV的lib文件。

然后：把ONNX Runtime和OpenCV的lib文件名添加到链接器。

最后：把ONNX Runtime和OpenCV的 dll 文件名添加到项目工程的 Release 下。
在这里插入图片描述

3.2.3 推理步骤

同Python语言实现一样，模型推理部署需要三大步骤：预处理、模型推理、后处理。在这里，笔者重点介绍使用C++实现模型推理的流程。

（1）图像预处理

颜色空间转换，OpenCV 默认读取的图像是 BGR 格式，YOLO 模型通常要求输入 RGB 格式图像。
将图像调整为网络输入所需的固定尺寸（保持原始图像的宽高比在图像周围添加填充）。
归一化（将像素值缩放到 [0, 1] 区间）。
数据格式转换（HWC -> CHW）。

（2）模型推理

a. 引入头文件

#include <onnxruntime_cxx_api.h>

b. 初始化 ONNX Runtime 环境和会话

Step 1: 创建 ONNX Runtime 环境

env = Ort::Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "YOLOV8");

Ort::Env 是 ONNX Runtime 中的环境对象，它是一个全局性的对象，用于初始化和管理 ONNX Runtime 运行时环境。
功能：

初始化 ONNX Runtime 库。
控制日志记录级别和日志输出。
提供名称标识符，方便调试和跟踪。

ONNX Runtime 支持的日志级别：

ORT_LOGGING_LEVEL_VERBOSE：记录所有信息（详细级别）。
ORT_LOGGING_LEVEL_INFO：记录一般信息。
ORT_LOGGING_LEVEL_WARNING：记录警告信息。
ORT_LOGGING_LEVEL_ERROR：仅记录错误信息。
ORT_LOGGING_LEVEL_FATAL：仅记录致命错误信息。

Step 2: 创建 ONNX Runtime 会话选项

设置 ONNX Runtime 会话的选项。这可能包括配置 GPU 使用、优化器级别、执行模式等。

sessionOptions = Ort::SessionOptions();

它控制 ONNX 模型在推理时的行为，包括：线程数（并行计算能力）；优化级别（对模型进行图优化）；CUDA 使用（GPU 加速）；内存分配器；会话日志设置等。

// 设置线程数
sessionOptions.SetIntraOpNumThreads(1); 
//设置使用 GPU 推理加速
OrtCUDAProviderOptions cudaOption;//OrtCUDAProviderOptions 是 ONNX Runtime 提供的一个结构体，用于配置 CUDA GPU 推理选项，当在 GPU 上使用 ONNX Runtime 时，需要通过该结构体指定 CUDA 相关参数。
sessionOptions.AppendExecutionProvider_CUDA(cudaOption);
// 设置图优化级别为全部优化（最大优化）
sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);

在 ONNX Runtime 中，SetGraphOptimizationLevel 用于设置图优化的级别，影响模型执行时的效率和性能。图优化有助于提高推理速度和减少内存消耗。不同的优化级别会对模型执行过程中的节点、计算图进行不同程度的优化。

常见的图优化级别

ORT_ENABLE_BASIC：
- 这是基本优化级别。
- 启用对计算图的基本优化，例如节点合并、常量折叠、去除无用的操作等。
- 相比于未启用优化，这个级别能带来一定程度的性能提升。
ORT_ENABLE_EXTENDED：
- 启用更高级的优化策略，例如通过对操作进行更复杂的优化来加速推理。
- 优化程度更高，可能会进一步减少内存占用和计算量。
ORT_ENABLE_ALL：
- 启用所有可能的优化策略，包括最激进的优化。
- 这是最大优化级别，会尝试最大限度地提升推理性能。
- 包括节点的合并、常量折叠、冗余节点移除、图的精简等多个优化过程。
- 适合追求最高性能的场景，但可能会增加模型加载时间，尤其是在某些复杂的模型中。

为什么使用 ORT_ENABLE_ALL？

性能提升： ORT_ENABLE_ALL 可以对计算图执行更多的优化，极大地提升推理速度。
内存优化：优化后的图通常会更小，内存占用也会减少。
适用场景：对于生产环境中的高性能需求，或者需要进行大量推理的场景，启用所有优化可以显著减少执行时间和内存消耗。

Step 3: 加载 ONNX 模型文件

加载预训练的 ONNX 模型文件。
使用运行时环境、会话选项和模型创建一个 Ort::Session 对象。

const wchar_t* modelPath = "yolov8.onnx";
Ort::Session session(env, modelPath, sessionOptions);

其中，第二个参数modelPath，模型的路径需要以宽字符（wchar_t*）格式传递。因为Windows 系统中的文件路径通常使用宽字符编码（wchar_t）。

可以使用c_str() 方法，它返回 std::wstring 对象的指针，确保符合 Ort::Session 构造函数所需的格式。方便与需要const char或const wchar_t类型的 C 风格函数或库（如 OpenCV、ONNX Runtime 等）兼容。

OpenCV：cv::imread() 接收 const char*。
ONNX Runtime（Windows 平台）：Ort::Session 需要 const wchar_t* 。
对于 std::string：返回 const char*。
对于 std::wstring：返回 const wchar_t*。

如果你的模型路径原本是 std::string 类型，可以通过一个转换函数将其转换为 std::wstring，例如：

std::wstring w_modelPath = utils::charToWstring(modelPath.c_str());std::wstring utils::charToWstring(const char *str)
{typedef std::codecvt_utf8<wchar_t> convert_type;//std::codecvt_utf8<wchar_t> 是一种转换类型，用于将UTF-8字符串与wchar_t宽字符字符串之间进行相互转换。std::wstring_convert<convert_type, wchar_t> converter;//std::wstring_convert 需要一个编码转换类型（如std::codecvt_utf8）和一个宽字符类型（如 wchar_t）return converter.from_bytes(str);
}

c. 获取模型输入/输出信息

从 Ort::Session 对象中获取模型输入和输出的详细信息，包括数量、名称、类型和形状。

在 ONNX Runtime 中，Ort::Session 提供了两种方法来获取模型输入/输出名称：

GetInputName
使用用户提供的内存分配器，如 Ort::AllocatorWithDefaultOptions。
返回的是 char*，指向分配的内存区域。
需要用户确保分配的内存不会泄漏，ONNX Runtime 不自动释放它。如果分配器没有释放功能，可能导致内存泄漏。
需要搭配 allocator.Free(inputName); // 释放名称内存

GetInputNameAllocated
直接返回一个 Ort::AllocatedStringPtr对象（封装了分配的字符串指针和释放逻辑），而不是简单的 char*。
内存管理更为安全，因为返回的 Ort::AllocatedStringPtr 是 RAII 风格的对象，自动释放内存。

Ort::AllocatorWithDefaultOptions allocator;
//ONNX Runtime 提供的一个默认内存分配器类，用于管理内存资源，特别是在获取模型输入/输出的元数据（如名称、形状）时非常有用// 获取输入信息
std::vector<const char *> inputNames;
std::vector<Ort::AllocatedStringPtr> input_names_ptr;
std::vector<std::vector<int64_t>> inputShapes;
bool isDynamicInputShape{};size_t numInputNodes = session.GetInputCount(); //输入数量
for (size_t i = 0; i < numInputNodes; ++i) 
{// 输入名称auto input_name= session.GetInputNameAllocated(i, allocator);inputNames.push_back(input_name.get());//get 返回指向的原始字符串指针，也就是 const char* 类型input_names_ptr.push_back(std::move(input_name));   // 输入类型和形状Ort::TypeInfo inputTypeInfo = session.GetInputTypeInfo(i);std::vector<int64_t> inputTensorShape = inputTypeInfo.GetTensorTypeAndShapeInfo().GetShape();inputShapes.push_back(inputTensorShape);isDynamicInputShape = false;// checking if width and height are dynamicif (inputTensorShape[2] == -1 && inputTensorShape[3] == -1){std::cout << "Dynamic input shape" << std::endl;this->isDynamicInputShape = true;}
}// 获取输出信息
std::vector<const char *> outputNames;
std::vector<Ort::AllocatedStringPtr> output_names_ptr;
std::vector<std::vector<int64_t>> outputShapes;
int classNums = 3;size_t numOutputNodes = session.GetOutputCount();//大于1，分割
if (num_output_nodes > 1)
{hasMask = true;std::cout << "Instance Segmentation" << std::endl;
}
elsestd::cout << "Object Detection" << std::endl;
for (size_t i = 0; i < numOutputNodes; ++i) 
{// 输出名称auto output_name = session.GetOutputNameAllocated(i, allocator);outputNames.push_back(output_name.get());output_names_ptr.push_back(std::move(output_name));// 输出类型和形状Ort::TypeInfo outputTypeInfo = session.GetOutputTypeInfo(i);std::vector<int64_t> outputTensorShape = outputTypeInfo.GetTensorTypeAndShapeInfo().GetShape();outputShapes.push_back(outputTensorShape);if (i == 0){if (!this->hasMask)classNums = outputTensorShape[1] - 4;elseclassNums = outputTensorShape[1] - 4 - 32;}
}

查看模型的输入和输出层可以使用netron这个网站可视化，直接导入onnx模型即可。

输入层：
在这里插入图片描述
输出层：

d. 创建输入张量

std::vector<Ort::Value> inputTensors;Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);//表示输入张量数据存储在 CPU 内存中inputTensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, inputTensorValues.data(), inputTensorSize,inputTensorShape.data(), inputTensorShape.size()));//将数据创建为一个 ONNX Tensor

CreateTensor参数解释：
memoryInfo：内存信息，表示数据存储在 CPU 上。
inputTensorValues.data()：指向 Tensor 数据的起始位置。
inputTensorSize：Tensor 数据的元素个数。
inputTensorShape.data()：Tensor 形状的指针。
inputTensorShape.size()：Tensor 形状的维度数量。

e. 进行推理

std::vector<Ort::Value> outputTensors = session.Run(Ort::RunOptions{nullptr}, inputNames.data(), inputTensors.data(), 1, outputNames.data(), outputNames.size());

run 参数解释：
Ort::RunOptions{nullptr}：RunOptions 是 ONNX Runtime 执行配置对象，这里传入 nullptr 使用默认配置。
inputNames.data()：输入 Tensor 名称数组的指针，指定模型输入的名称。
inputTensors.data()：输入 Tensor 数据的指针，指定输入数据。
1：表示输入 Tensor 数量。
outputNames.data()：输出 Tensor 名称数组的指针，指定需要输出的节点名称。
outputNames.size()：输出 Tensor 数量。

Run 返回一个包含输出 Tensor 的向量 std::vectorOrt::Value，每个 Ort::Value 包含模型的一个输出。

（3）后处理

从输出张量获取数据，并通过 cv::Mat 转换为矩阵格式（CHW → HWC）。
提取最高置信度的类别和对应的分数，过滤低置信度目标。
将中心坐标 (cx, cy) 和宽高 (w, h) 转换为左上角坐标 (x, y) 和尺寸格式。
去除重叠度高的冗余检测框，保留置信度最高的框。
将检测框从网络输入尺寸映射回原图尺寸。

3.2.4 完整代码部署

utils.cpp

#include "utils.h"size_t utils::vectorProduct(const std::vector<int64_t> &vector)
{if (vector.empty())return 0;size_t product = 1;for (const auto &element : vector)product *= element;return product;
}std::wstring utils::charToWstring(const char *str)
{typedef std::codecvt_utf8<wchar_t> convert_type;//std::codecvt_utf8<wchar_t> 是一种 转换类型，用于将 UTF-8 字符串与 wchar_t 宽字符字符串之间进行相互转换。//在 Windows 系统中，wchar_t 通常是 UTF-16 编码。//在 Linux / Unix 系统中，wchar_t 通常是 UTF - 32 编码。std::wstring_convert<convert_type, wchar_t> converter;//std::wstring_convert 需要一个 编码转换类型（如 std::codecvt_utf8）和一个 宽字符类型（如 wchar_t）return converter.from_bytes(str);
}std::vector<std::string> utils::loadNames(const std::string &path)
{// load class namesstd::vector<std::string> classNames;std::ifstream infile(path);if (infile.good()){std::string line;while (getline(infile, line)){if (line.back() == '\r')line.pop_back();classNames.emplace_back(line);}infile.close();}else{std::cerr << "ERROR: Failed to access class name path: " << path << std::endl;}// set colorsrand(time(0));for (int i = 0; i < 2 * classNames.size(); i++){int b = rand() % 256;int g = rand() % 256;int r = rand() % 256;colors.push_back(cv::Scalar(b, g, r));}return classNames;
}void utils::visualizeDetection(cv::Mat &im, std::vector<Yolov8Result> &results,const std::vector<std::string> &classNames)
{cv::Mat image = im.clone();for (const Yolov8Result &result : results){int x = result.box.x;int y = result.box.y;int conf = (int)std::round(result.conf * 100);int classId = result.classId;std::string label = classNames[classId] + " 0." + std::to_string(conf);int baseline = 0;cv::Size size = cv::getTextSize(label, cv::FONT_ITALIC, 0.4, 1, &baseline);image(result.box).setTo(colors[classId + classNames.size()], result.boxMask);cv::rectangle(image, result.box, colors[classId], 2);cv::rectangle(image,cv::Point(x, y), cv::Point(x + size.width, y + 12),colors[classId], -1);cv::putText(image, label,cv::Point(x, y - 3 + 12), cv::FONT_ITALIC,0.4, cv::Scalar(0, 0, 0), 1);}cv::addWeighted(im, 0.4, image, 0.6, 0, im);
}void utils::letterbox(const cv::Mat &image, cv::Mat &outImage,const cv::Size &newShape = cv::Size(640, 640),const cv::Scalar &color = cv::Scalar(114, 114, 114),bool auto_ = true,//是否根据步幅对填充尺寸进行自动调整bool scaleFill = false,//是否强制将图像拉伸到目标尺寸（忽略长宽比）bool scaleUp = true,//是否允许放大图像，如果为 false，图像只会缩小或保持原始尺寸int stride = 32)//对齐步幅，用于控制填充的边缘尺寸
{cv::Size shape = image.size();//计算缩放比例float r = std::min((float)newShape.height / (float)shape.height,(float)newShape.width / (float)shape.width);//如果 scaleUp 为 false，缩放比例 r 被限制为 1.0，确保图像不会被放大（仅会缩小或保持原尺寸if (!scaleUp)r = std::min(r, 1.0f);float ratio[2]{r, r};//调整图像尺寸int newUnpad[2]{(int)std::round((float)shape.width * r),(int)std::round((float)shape.height * r)};//计算填充大小auto dw = (float)(newShape.width - newUnpad[0]);auto dh = (float)(newShape.height - newUnpad[1]);if (auto_){dw = (float)((int)dw % stride);dh = (float)((int)dh % stride);}else if (scaleFill){dw = 0.0f;dh = 0.0f;newUnpad[0] = newShape.width;newUnpad[1] = newShape.height;ratio[0] = (float)newShape.width / (float)shape.width;ratio[1] = (float)newShape.height / (float)shape.height;}dw /= 2.0f;dh /= 2.0f;if (shape.width != newUnpad[0] && shape.height != newUnpad[1]){cv::resize(image, outImage, cv::Size(newUnpad[0], newUnpad[1]));}int top = int(std::round(dh - 0.1f));int bottom = int(std::round(dh + 0.1f));int left = int(std::round(dw - 0.1f));int right = int(std::round(dw + 0.1f));//添加填充cv::copyMakeBorder(outImage, outImage, top, bottom, left, right, cv::BORDER_CONSTANT, color);
}void utils::scaleCoords(cv::Rect &coords,cv::Mat &mask,const float maskThreshold,const cv::Size &imageShape,const cv::Size &imageOriginalShape)
{float gain = std::min((float)imageShape.height / (float)imageOriginalShape.height,(float)imageShape.width / (float)imageOriginalShape.width);//计算缩放比例int pad[2] = {(int)(((float)imageShape.width - (float)imageOriginalShape.width * gain) / 2.0f),(int)(((float)imageShape.height - (float)imageOriginalShape.height * gain) / 2.0f)};//计算填充边距 coords.x = (int)std::round(((float)(coords.x - pad[0]) / gain));//还原到原始图像坐标coords.x = std::max(0, coords.x);coords.y = (int)std::round(((float)(coords.y - pad[1]) / gain));coords.y = std::max(0, coords.y);coords.width = (int)std::round(((float)coords.width / gain));coords.width = std::min(coords.width, imageOriginalShape.width - coords.x);coords.height = (int)std::round(((float)coords.height / gain));coords.height = std::min(coords.height, imageOriginalShape.height - coords.y);mask = mask(cv::Rect(pad[0], pad[1], imageShape.width - 2 * pad[0], imageShape.height - 2 * pad[1]));//裁剪掩码并去掉边缘填充cv::resize(mask, mask, imageOriginalShape, cv::INTER_LINEAR);mask = mask(coords) > maskThreshold;
}
template <typename T>
T utils::clip(const T &n, const T &lower, const T &upper)
{return std::max(lower, std::min(n, upper));
}

predictor.cpp

#include "yolov8Predictor.h"YOLOPredictor::YOLOPredictor(const std::string &modelPath,const bool &isGPU,float confThreshold,float iouThreshold,float maskThreshold)
{this->confThreshold = confThreshold;this->iouThreshold = iouThreshold;this->maskThreshold = maskThreshold;//初始化一个 ONNX 运行时环境 env，并设置日志级别为警告。env = Ort::Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "YOLOV8");//创建一个会话选项sessionOptions = Ort::SessionOptions();//获取当前 ONNX 运行时支持的执行提供程序，并检查是否支持 CUDA 执行提供程序。std::vector<std::string> availableProviders = Ort::GetAvailableProviders();std::cout << "--------------------" << std::endl;for (int i = 0; i < availableProviders.size(); ++i){std::cout << availableProviders.at(i) << std::endl;}auto cudaAvailable = std::find(availableProviders.begin(), availableProviders.end(), "CUDAExecutionProvider");//在指定的 范围 内搜索 第一个等于给定值的元素，并返回一个指向该元素的迭代器。//如果未找到匹配的元素，std::find 返回指向范围末尾的迭代器（即 end()）。OrtCUDAProviderOptions cudaOption;//OrtCUDAProviderOptions 是 ONNX Runtime 提供的一个结构体，用于配置 CUDA GPU 推理选项，当在 GPU 上使用 ONNX Runtime 时，需要通过该结构体指定 CUDA 相关参数。//根据是否使用 GPU 和 CUDA 提供程序是否可用，选择相应的执行提供程序，并输出相应的推断设备信息。if (isGPU && (cudaAvailable == availableProviders.end()))//end()指向 容器末尾的下一个位置 的迭代器{std::cout << "GPU is not supported by your ONNXRuntime build. Fallback to CPU." << std::endl;std::cout << "Inference device: CPU" << std::endl;}else if (isGPU && (cudaAvailable != availableProviders.end())){std::cout << "Inference device: GPU" << std::endl;sessionOptions.AppendExecutionProvider_CUDA(cudaOption);}else{std::cout << "Inference device: CPU" << std::endl;}#ifdef _WIN32//Windows 系统中的文件路径通常使用 宽字符（Unicode） 编码（wchar_t）std::wstring w_modelPath = utils::charToWstring(modelPath.c_str());//c_str()将 std::string 或 std::wstring 转换为以 '\0' 结尾的 C 风格字符串，方便与需要 const char* 或 const wchar_t* 类型的 C 风格函数或库（如 OpenCV、ONNX Runtime 等）兼容。//OpenCV：cv::imread() 接收 const char*。//ONNX Runtime（Windows 平台）：Ort::Session 需要 const wchar_t* 。session = Ort::Session(env, w_modelPath.c_str(), sessionOptions);//创建一个 Ort::Session 会话，通过会话来执行推理任务。
#elsesession = Ort::Session(env, modelPath.c_str(), sessionOptions);
#endif//获取输入节点和输出节点的数量，并判断是否存在掩码输出。const size_t num_input_nodes = session.GetInputCount();   //==1const size_t num_output_nodes = session.GetOutputCount(); //==1,2if (num_output_nodes > 1){this->hasMask = true;std::cout << "Instance Segmentation" << std::endl;}elsestd::cout << "Object Detection" << std::endl;Ort::AllocatorWithDefaultOptions allocator;//Ort::AllocatorWithDefaultOptions 是 ONNX Runtime 提供的一个默认内存分配器类，用于管理内存资源，特别是在获取模型输入/输出的元数据（如名称、形状）时非常有用//遍历输入节点，获取其名称、形状信息，并检查输入形状是否为动态形状。for (int i = 0; i < num_input_nodes; i++){auto input_name = session.GetInputNameAllocated(i, allocator);//返回的是一个 Ort::AllocatedStringPtr 对象，而不是简单的 char*//GetInputName返回的字符串指针是一个 C 风格字符串（char*）this->inputNames.push_back(input_name.get());//get 返回指向的原始字符串指针，也就是 const char* 类型input_names_ptr.push_back(std::move(input_name));Ort::TypeInfo inputTypeInfo = session.GetInputTypeInfo(i);std::vector<int64_t> inputTensorShape = inputTypeInfo.GetTensorTypeAndShapeInfo().GetShape();this->inputShapes.push_back(inputTensorShape);this->isDynamicInputShape = false;// checking if width and height are dynamicif (inputTensorShape[2] == -1 && inputTensorShape[3] == -1){std::cout << "Dynamic input shape" << std::endl;this->isDynamicInputShape = true;}}//遍历输出节点，获取其名称和形状信息，并根据输出节点的数量和是否存在掩码输出来确定类别数量。for (int i = 0; i < num_output_nodes; i++){auto output_name = session.GetOutputNameAllocated(i, allocator);this->outputNames.push_back(output_name.get());output_names_ptr.push_back(std::move(output_name));Ort::TypeInfo outputTypeInfo = session.GetOutputTypeInfo(i);std::vector<int64_t> outputTensorShape = outputTypeInfo.GetTensorTypeAndShapeInfo().GetShape();this->outputShapes.push_back(outputTensorShape);if (i == 0){if (!this->hasMask)classNums = outputTensorShape[1] - 4;elseclassNums = outputTensorShape[1] - 4 - 32;}}
}void YOLOPredictor::getBestClassInfo(std::vector<float>::iterator it,float &bestConf,int &bestClassId,const int _classNums)
{// first 4 element are boxbestClassId = 4;bestConf = 0;for (int i = 4; i < _classNums + 4; i++){if (it[i] > bestConf){bestConf = it[i];bestClassId = i - 4;}}
}
cv::Mat YOLOPredictor::getMask(const cv::Mat &maskProposals,const cv::Mat &maskProtos)
{cv::Mat protos = maskProtos.reshape(0, {(int)this->outputShapes[1][1], (int)this->outputShapes[1][2] * (int)this->outputShapes[1][3]});cv::Mat matmul_res = (maskProposals * protos).t();cv::Mat masks = matmul_res.reshape(1, {(int)this->outputShapes[1][2], (int)this->outputShapes[1][3]});cv::Mat dest;// sigmoidcv::exp(-masks, dest);dest = 1.0 / (1.0 + dest);cv::resize(dest, dest, cv::Size((int)this->inputShapes[0][2], (int)this->inputShapes[0][3]), cv::INTER_LINEAR);return dest;
}void YOLOPredictor::preprocessing(cv::Mat &image, float *&blob, std::vector<int64_t> &inputTensorShape)
{cv::Mat resizedImage, floatImage;cv::cvtColor(image, resizedImage, cv::COLOR_BGR2RGB);//BGR->RGButils::letterbox(resizedImage, resizedImage, cv::Size((int)this->inputShapes[0][2], (int)this->inputShapes[0][3]),cv::Scalar(114, 114, 114), this->isDynamicInputShape,false, true, 32);//用于调整图像的尺寸，使其适应网络输入要求的尺寸，同时保持原始图像的长宽比。它会在图像周围添加填充，填充的颜色由 cv::Scalar(114, 114, 114) 指定，这通常是 YOLO 等模型的默认填充色。inputTensorShape[2] = resizedImage.rows;inputTensorShape[3] = resizedImage.cols;resizedImage.convertTo(floatImage, CV_32FC3, 1 / 255.0);//将每个像素的值归一化到 [0, 1] 之间blob = new float[floatImage.cols * floatImage.rows * floatImage.channels()];//为图像数据分配内存，大小为图像宽度 × 高度 × 通道数//每个像素的数据将存储为一个 float 类型的值cv::Size floatImageSize{floatImage.cols, floatImage.rows};// hwc -> chwstd::vector<cv::Mat> chw(floatImage.channels());for (int i = 0; i < floatImage.channels(); ++i){chw[i] = cv::Mat(floatImageSize, CV_32FC1, blob + i * floatImageSize.width * floatImageSize.height);//这里的 cv::Mat 对象并不直接复制数据，而是创建了一个指向 blob 中特定位置的“视图”。这个“视图”指向的是 blob 中为每个通道分配的内存区域。//计算出每个通道数据在 blob 数组中的起始位置}cv::split(floatImage, chw);//将图像数据按通道拆分并将其存储在 blob 指向的内存中
}std::vector<Yolov8Result> YOLOPredictor::postprocessing(const cv::Size &resizedImageShape,const cv::Size &originalImageShape,std::vector<Ort::Value> &outputTensors)
{// for boxstd::vector<cv::Rect> boxes;std::vector<float> confs;std::vector<int> classIds;float *boxOutput = outputTensors[0].GetTensorMutableData<float>();//获取指向第一个输出张量数据的指针//[1,4+n,8400]=>[1,8400,4+n] or [1,4+n+32,8400]=>[1,8400,4+n+32]cv::Mat output0 = cv::Mat(cv::Size((int)this->outputShapes[0][2], (int)this->outputShapes[0][1]), CV_32F, boxOutput).t();//chw->hwcfloat *output0ptr = (float *)output0.data;int rows = (int)this->outputShapes[0][2];int cols = (int)this->outputShapes[0][1];// std::cout << rows << cols << std::endl;// if hasMaskstd::vector<std::vector<float>> picked_proposals;cv::Mat mask_protos;for (int i = 0; i < rows; i++){std::vector<float> it(output0ptr + i * cols, output0ptr + (i + 1) * cols);//提取每行数据float confidence;int classId;this->getBestClassInfo(it.begin(), confidence, classId, classNums);//提取最高置信度的类别和对应的分数if (confidence > this->confThreshold)//过滤低置信度目标{if (this->hasMask){std::vector<float> temp(it.begin() + 4 + classNums, it.end());//跳过前面 4 个边界框坐标和 classNums 个类别置信度，定位到掩码数据部分的起始位置。picked_proposals.push_back(temp);}//将检测框的坐标转换为左上角格式 (left, top, width, height)，存储到 boxesint centerX = (int)(it[0]);int centerY = (int)(it[1]);int width = (int)(it[2]);int height = (int)(it[3]);int left = centerX - width / 2;int top = centerY - height / 2;boxes.emplace_back(left, top, width, height);confs.emplace_back(confidence);classIds.emplace_back(classId);}}//对检测框进行非极大值抑制，去除重叠度较高的冗余框std::vector<int> indices;//保存了保留的检测框索引cv::dnn::NMSBoxes(boxes, confs, this->confThreshold, this->iouThreshold, indices);if (this->hasMask){float *maskOutput = outputTensors[1].GetTensorMutableData<float>();std::vector<int> mask_protos_shape = {1, (int)this->outputShapes[1][1], (int)this->outputShapes[1][2], (int)this->outputShapes[1][3]};mask_protos = cv::Mat(mask_protos_shape, CV_32F, maskOutput);}std::vector<Yolov8Result> results;for (int idx : indices){Yolov8Result res;res.box = cv::Rect(boxes[idx]);if (this->hasMask)res.boxMask = this->getMask(cv::Mat(picked_proposals[idx]).t(), mask_protos);//如果存在掩码，调用 getMask 生成实例分割掩码elseres.boxMask = cv::Mat::zeros((int)this->inputShapes[0][2], (int)this->inputShapes[0][3], CV_8U);utils::scaleCoords(res.box, res.boxMask, this->maskThreshold, resizedImageShape, originalImageShape);//将检测框和掩码从网络输入大小映射回原图坐标系res.conf = confs[idx];res.classId = classIds[idx];results.emplace_back(res);}return results;
}std::vector<Yolov8Result> YOLOPredictor::predict(cv::Mat &image)
{float *blob = nullptr;//用于存储图像预处理后的数据std::vector<int64_t> inputTensorShape{1, 3, -1, -1};//-1, -1 表示动态输入的高度和宽度（在运行时由实际图像尺寸决定）this->preprocessing(image, blob, inputTensorShape);//预处理size_t inputTensorSize = utils::vectorProduct(inputTensorShape);//计算输入 Tensor 中的 元素个数std::vector<float> inputTensorValues(blob, blob + inputTensorSize);//将预处理后的数据拷贝到向量中,blob首地址std::vector<Ort::Value> inputTensors;Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);//表示 Tensor 数据存储在 CPU 内存中。inputTensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, inputTensorValues.data(), inputTensorSize,inputTensorShape.data(), inputTensorShape.size()));//将数据创建为一个 ONNX Tensor//memoryInfo：内存信息，表示数据存储在 CPU 上。//inputTensorValues.data()：指向 Tensor 数据的起始位置。//inputTensorSize：Tensor 数据的元素个数。//inputTensorShape.data()：Tensor 形状的指针。//inputTensorShape.size()：Tensor 形状的维度数量。std::vector<Ort::Value> outputTensors = this->session.Run(Ort::RunOptions{nullptr},this->inputNames.data(),inputTensors.data(),1,this->outputNames.data(),this->outputNames.size());//Ort::RunOptions{nullptr}：RunOptions 是 ONNX Runtime 执行配置对象，这里传入 nullptr 使用默认配置。//this->inputNames.data()：输入 Tensor 名称数组的指针，指定模型输入的名称。//inputTensors.data()：输入 Tensor 数据的指针，指定输入数据。//1：表示输入 Tensor 数量。//this->outputNames.data()：输出 Tensor 名称数组的指针，指定需要输出的节点名称。//this->outputNames.size()：输出 Tensor 数量。//Run 返回一个包含 输出 Tensor 的向量 std::vector<Ort::Value>，每个 Ort::Value 包含模型的一个输出。cv::Size resizedShape = cv::Size((int)inputTensorShape[3], (int)inputTensorShape[2]);//获取模型输入的尺寸信息std::vector<Yolov8Result> result = this->postprocessing(resizedShape,image.size(),outputTensors);//后处理delete[] blob;return result;
}