一、前言
了解到Sahi,是通过切图,实现提高小目标的检测效果。sahi 目前支持yolo5\yolo8\mmdet\detection2 等等算法,本篇主要通过实验onnx加载模型的方式使sahi支持yolov10。
二、代码
(1)转换模型
首先使用 conda创建虚拟环境,配置好yolov10环境,然后 pip 将sahi 安装上
pip install sahi
将yolov10 模型导出为 onnx格式文件,命令窗cd 至 模型文件所在目录 执行
yolo export model=vitrolite_best.pt format=onnx opset=11 simplify
转换成功,将在目录下生成同名 onnx格式文件
(2)加载onnx 模型推理代码
参考 sahi 提供的demo 文件 inference_for_yolov8_onnx.ipynb ,分块大小和重叠比例可设置
from sahi import AutoDetectionModel
from sahi.utils.cv import read_image
from sahi.utils.file import download_from_url
from sahi.predict import get_prediction, get_sliced_prediction, predictimport timeif __name__ == '__main__':yolov8_onnx_model_path = "runs\\detect\\train_v102\\weights\\vitrolite_best.onnx" #加载自己的onnx模型文件#yolov8_onnx_model_path = "D:\\Project\\yolov8\\weights\\yolov8x.onnx"category_mapping = {'0': 'p0', '1': 'p1', '2': 'p2', '3': 'p3' } #类别映射,换成你的detection_model = AutoDetectionModel.from_pretrained(model_type='yolov8onnx',model_path=yolov8_onnx_model_path,confidence_threshold=0.3,category_mapping=category_mapping,device='cuda:0', # or 'cuda:0' #这里要使用GPU)#img_path = "datasets\\vitrolite\\images\\val\\c144846_5_10.png" #推导图片路径#result = get_prediction(read_image(img_path), detection_model) #第一次启动GOU ,时间比较慢,必须先启动一次#分块检测result = get_sliced_prediction("D:\\Project\\vitroliteDefect\\tile_round1_train_20201231\\train_imgs\\197_2_t20201119084924170_CAM1.jpg",detection_model,slice_height=640,slice_width=640,overlap_height_ratio=0.05,overlap_width_ratio=0.05)result.export_visuals(export_dir="demo_data/")
(3)修改sahi的接口文件
找到pip安装的sahi 位置 , 修改 yolov8onnx.py
修改 _post_process 函数 , 主要是由于yolov8 和yolov10 输出 shape有所不同,yolov10没有nms
def _post_process(self, outputs: np.ndarray, input_shape: Tuple[int, int], image_shape: Tuple[int, int]) -> List[torch.Tensor]:image_h, image_w = image_shapeinput_w, input_h = input_shapepredictions = np.squeeze(outputs[0]) # 不用.T 转置 , #( 300,6)# 在下面这个地方改动,by zjy , self.confidence_threshold 是0.3#for row in predictions:#print( "row:" , row.shape )scores = predictions[: , 4] #shape 为 ( 300,6) ,第5个,下标为4#self.confidence_threshold = 0.9predictions = predictions[ scores > self.confidence_threshold, : ]scores = scores[scores > self.confidence_threshold]boxes = predictions[:, :4]boxes = boxes.astype(np.int32)class_ids = predictions[:, 5 ].astype(np.int32)# Format the resultsprediction_result = []for bbox, score, label in zip(boxes , scores , class_ids ):bbox = bbox.tolist()cls_id = int(label)prediction_result.append([bbox[0], bbox[1], bbox[2], bbox[3], score, cls_id])""" # Filter out object confidence scores below thresholdscores = np.max(predictions[:, 4:], axis=1)predictions = predictions[scores > self.confidence_threshold, :]scores = scores[scores > self.confidence_threshold]class_ids = np.argmax(predictions[:, 4:], axis=1)boxes = predictions[:, :4]# Scale boxes to original dimensionsinput_shape = np.array([input_w, input_h, input_w, input_h])boxes = np.divide(boxes, input_shape, dtype=np.float32)boxes *= np.array([image_w, image_h, image_w, image_h])boxes = boxes.astype(np.int32)# Convert from xywh two xyxyboxes = xywh2xyxy(boxes).round().astype(np.int32)# Perform non-max supressionsindices = non_max_supression(boxes, scores, self.iou_threshold)# Format the resultsprediction_result = []for bbox, score, label in zip(boxes[indices], scores[indices], class_ids[indices]):bbox = bbox.tolist()cls_id = int(label)prediction_result.append([bbox[0], bbox[1], bbox[2], bbox[3], score, cls_id])"""prediction_result = [torch.tensor(prediction_result)]# prediction_result = [prediction_result]return prediction_result
三、结果
运行推理代码,在 demo_data 文件夹下生成结果图片, 实验图片是一张高分辨率瓷砖图片