Sahi+Yolov10

一、前言

了解到Sahi，是通过切图，实现提高小目标的检测效果。sahi 目前支持yolo5\yolo8\mmdet\detection2 等等算法，本篇主要通过实验onnx加载模型的方式使sahi支持yolov10。

二、代码

（1）转换模型

首先使用 conda创建虚拟环境，配置好yolov10环境，然后 pip 将sahi 安装上

pip install sahi

将yolov10 模型导出为 onnx格式文件，命令窗cd 至模型文件所在目录执行

 yolo export model=vitrolite_best.pt  format=onnx opset=11 simplify

转换成功，将在目录下生成同名 onnx格式文件

（2）加载onnx 模型推理代码

参考 sahi 提供的demo 文件 inference_for_yolov8_onnx.ipynb ，分块大小和重叠比例可设置

from sahi import AutoDetectionModel
from sahi.utils.cv import read_image
from sahi.utils.file import download_from_url
from sahi.predict import get_prediction, get_sliced_prediction, predictimport timeif __name__ == '__main__':yolov8_onnx_model_path = "runs\\detect\\train_v102\\weights\\vitrolite_best.onnx"  #加载自己的onnx模型文件#yolov8_onnx_model_path = "D:\\Project\\yolov8\\weights\\yolov8x.onnx"category_mapping = {'0': 'p0', '1': 'p1', '2': 'p2', '3': 'p3'  }  #类别映射，换成你的detection_model = AutoDetectionModel.from_pretrained(model_type='yolov8onnx',model_path=yolov8_onnx_model_path,confidence_threshold=0.3,category_mapping=category_mapping,device='cuda:0', # or 'cuda:0'  #这里要使用GPU)#img_path = "datasets\\vitrolite\\images\\val\\c144846_5_10.png"   #推导图片路径#result = get_prediction(read_image(img_path), detection_model)  #第一次启动GOU ，时间比较慢,必须先启动一次#分块检测result = get_sliced_prediction("D:\\Project\\vitroliteDefect\\tile_round1_train_20201231\\train_imgs\\197_2_t20201119084924170_CAM1.jpg",detection_model,slice_height=640,slice_width=640,overlap_height_ratio=0.05,overlap_width_ratio=0.05)result.export_visuals(export_dir="demo_data/")

（3）修改sahi的接口文件

找到pip安装的sahi 位置，修改 yolov8onnx.py

修改 _post_process 函数，主要是由于yolov8 和yolov10 输出 shape有所不同，yolov10没有nms

    def _post_process(self, outputs: np.ndarray, input_shape: Tuple[int, int], image_shape: Tuple[int, int]) -> List[torch.Tensor]:image_h, image_w = image_shapeinput_w, input_h = input_shapepredictions = np.squeeze(outputs[0])  # 不用.T 转置  ,  #( 300,6)# 在下面这个地方改动，by zjy ,  self.confidence_threshold 是0.3#for row in predictions:#print(  "row:" ,  row.shape   )scores =  predictions[: , 4]  #shape 为 ( 300,6) ,第5个，下标为4#self.confidence_threshold = 0.9predictions = predictions[ scores > self.confidence_threshold, : ]scores = scores[scores > self.confidence_threshold]boxes = predictions[:, :4]boxes = boxes.astype(np.int32)class_ids = predictions[:,  5 ].astype(np.int32)# Format the resultsprediction_result = []for bbox, score, label in zip(boxes , scores , class_ids ):bbox = bbox.tolist()cls_id = int(label)prediction_result.append([bbox[0], bbox[1], bbox[2], bbox[3], score, cls_id])"""  # Filter out object confidence scores below thresholdscores = np.max(predictions[:, 4:], axis=1)predictions = predictions[scores > self.confidence_threshold, :]scores = scores[scores > self.confidence_threshold]class_ids = np.argmax(predictions[:, 4:], axis=1)boxes = predictions[:, :4]# Scale boxes to original dimensionsinput_shape = np.array([input_w, input_h, input_w, input_h])boxes = np.divide(boxes, input_shape, dtype=np.float32)boxes *= np.array([image_w, image_h, image_w, image_h])boxes = boxes.astype(np.int32)# Convert from xywh two xyxyboxes = xywh2xyxy(boxes).round().astype(np.int32)# Perform non-max supressionsindices = non_max_supression(boxes, scores, self.iou_threshold)# Format the resultsprediction_result = []for bbox, score, label in zip(boxes[indices], scores[indices], class_ids[indices]):bbox = bbox.tolist()cls_id = int(label)prediction_result.append([bbox[0], bbox[1], bbox[2], bbox[3], score, cls_id])"""prediction_result = [torch.tensor(prediction_result)]# prediction_result = [prediction_result]return prediction_result

三、结果

运行推理代码，在 demo_data 文件夹下生成结果图片, 实验图片是一张高分辨率瓷砖图片