1、yolov9模型概述
1.1 yolov9
YOLOv9意味着实时目标检测的重大进步,引入了可编程梯度信息(PGI)和通用高效层聚合网络(GELAN)等开创性技术。该模型在效率、准确性和适应性方面取得了显著改进,在MS COCO数据集上建立了新的基准。YOLOv9项目由一个独立的开源团队开发,建立在Ultralytics YOLOv5提供的强大代码库的基础上,展示了人工智能研究社区的合作精神。
yolov9模型在coco数据集的效果如下所示
1.2 gdip介绍
gdip-yolo是2022年提出了一个端到端的图像自适应目标检测框架,其论文中的效果展示了良好的图像增强效果。其提出了gdip模块 |mdgip模块 |GDIP regularizer模块等模块,并表明这是效果提升的关键。
2、gdip-yolov9实现
基于将gidp模块、ipam集成到ultralytics项目中实现支持预训练权重的gidp-yolov8、ipam-yolov8 所实现的项目代码进行实现。
2.1 创建yaml文件
将以下代码保存为yolov9c-gdip.yaml,如果是要使用IPAM模块,则将- [-1, 1, GatedDIP, [256,7,"gdip-RTTS.pt"]] # GDIP模块
修改为 - [-1, 1, IPAM, []] # ia-seg模块
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv9c
# 618 layers, 25590912 parameters, 104.0 GFLOPs# parameters
nc: 80 # number of classes# gelan backbone
backbone:- [-1, 1, GatedDIP, [256,7,"gdip-RTTS.pt"]] # GDIP模块- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]] # 2- [-1, 1, ADown, [256]] # 3-P3/8- [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]] # 4- [-1, 1, ADown, [512]] # 5-P4/16- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 6- [-1, 1, ADown, [512]] # 7-P5/32- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 8- [-1, 1, SPPELAN, [512, 256]] # 9head:- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 7], 1, Concat, [1]] # cat backbone P4- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 12- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 5], 1, Concat, [1]] # cat backbone P3- [-1, 1, RepNCSPELAN4, [256, 256, 128, 1]] # 15 (P3/8-small)- [-1, 1, ADown, [256]]- [[-1, 13], 1, Concat, [1]] # cat head P4- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 18 (P4/16-medium)- [-1, 1, ADown, [512]]- [[-1, 10], 1, Concat, [1]] # cat head P5- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 21 (P5/32-large)- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
2.2 生成gidp-yolov9模型
打开https://docs.ultralytics.com/models/yolov9/#performance-on-ms-coco-dataset 下载yolov9c模型
参考将gidp模块、ipam集成到ultralytics项目中实现支持预训练权重的gidp-yolov8、ipam-yolov8 中3.3 使用yolov8预训练权重 节中的代码,保存gidp-yolov9.pt模型
代码与执行效果如下所示。如果要生成yolov9c-IPAM模型,在创建好模型后放到save_model函数的第三个参数即可。
2.3 使用yolov9c-gdip模型
使用代码如下所示
from ultralytics import YOLO
if __name__ == '__main__':path="yolov9c-gdip.yaml"model=YOLO(path)model.load("gidp-yolov9c.pt")#加载生成好的预训练模型# 使用模型model.train(data="coco128.yaml", epochs=3,batch=4) # 训练模型metrics = model.val(data="coco128.yaml") # 在验证集上评估模型性能results = model("https://ultralytics.com/images/bus.jpg") # 对图像进行预测success = model.export(format="onnx")
代码执行日志输出如下所示,可以看到预训练权重正常加载,模型训练验证精度正常,且模型可以正常导出onnx模型。
256 7
load pretrain model from gdip-RTTS.pt
WARNING ⚠️ The file 'gidp-yolov9c.pt' appears to be improperly saved or formatted. For optimal results, use model.save('filename.p
t') to correctly save YOLO models.
Transferred 963/963 items from pretrained weights
New https://pypi.org/project/ultralytics/8.2.1 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.2.0 🚀 Python-3.8.16 torch-2.1.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 12288MiB)
engine\trainer: task=detect, mode=train, model=yolov9c-gdip.yaml, data=coco128.yaml, epochs=3, time=None, patience=100, batch=4, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\trainfrom n params module arguments
256 7
load pretrain model from gdip-RTTS.pt0 -1 1 6538646 ultralytics.nn.modules.GDIP.GatedDIP [256, 7, 'gdip-RTTS.pt'] 1 -1 1 1856 ultralytics.nn.modules.conv.Conv [3, 64, 3, 2]2 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]3 -1 1 212864 ultralytics.nn.modules.block.RepNCSPELAN4 [128, 256, 128, 64, 1]4 -1 1 164352 ultralytics.nn.modules.block.ADown [256, 256]5 -1 1 847616 ultralytics.nn.modules.block.RepNCSPELAN4 [256, 512, 256, 128, 1]6 -1 1 656384 ultralytics.nn.modules.block.ADown [512, 512]7 -1 1 2857472 ultralytics.nn.modules.block.RepNCSPELAN4 [512, 512, 512, 256, 1] 8 -1 1 656384 ultralytics.nn.modules.block.ADown [512, 512]9 -1 1 2857472 ultralytics.nn.modules.block.RepNCSPELAN4 [512, 512, 512, 256, 1] 10 -1 1 656896 ultralytics.nn.modules.block.SPPELAN [512, 512, 256]11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']12 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]13 -1 1 3119616 ultralytics.nn.modules.block.RepNCSPELAN4 [1024, 512, 512, 256, 1] 14 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']15 [-1, 5] 1 0 ultralytics.nn.modules.conv.Concat [1]16 -1 1 912640 ultralytics.nn.modules.block.RepNCSPELAN4 [1024, 256, 256, 128, 1] 17 -1 1 164352 ultralytics.nn.modules.block.ADown [256, 256]18 [-1, 13] 1 0 ultralytics.nn.modules.conv.Concat [1]19 -1 1 2988544 ultralytics.nn.modules.block.RepNCSPELAN4 [768, 512, 512, 256, 1] 20 -1 1 656384 ultralytics.nn.modules.block.ADown [512, 512]21 [-1, 10] 1 0 ultralytics.nn.modules.conv.Concat [1]22 -1 1 3119616 ultralytics.nn.modules.block.RepNCSPELAN4 [1024, 512, 512, 256, 1] 23 [16, 19, 22] 1 5644480 ultralytics.nn.modules.head.Detect [80, [256, 512, 512]]
YOLOv9c-gdip summary: 660 layers, 32129558 parameters, 32129542 gradients, 159.2 GFLOPsTransferred 963/963 items from pretrained weightsLogging results to runs\detect\train
Starting training for 3 epochs...Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size1/3 5.35G 0.9777 1.222 1.194 54 640: 100%|██████████| 32/32 [00:11<00:00, 2.67it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:04<00:00, 3 all 128 929 0.805 0.711 0.814 0.65Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size2/3 5.83G 0.9605 0.9842 1.164 44 640: 100%|██████████| 32/32 [00:11<00:00, 2.83it/s]Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:03<00:00, 4.all 128 929 0.836 0.706 0.821 0.654Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size3/3 5.22G 0.9349 0.8878 1.175 85 640: 100%|██████████| 32/32 [00:11<00:00, 2.84it/s]Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:03<00:00, 4.all 128 929 0.824 0.723 0.824 0.6593 epochs completed in 0.022 hours.
Optimizer stripped from runs\detect\train\weights\last.pt, 64.8MB
Optimizer stripped from runs\detect\train\weights\best.pt, 64.8MBValidating runs\detect\train\weights\best.pt...
Ultralytics YOLOv8.2.0 🚀 Python-3.8.16 torch-2.1.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 12288MiB)
YOLOv9c-gdip summary (fused): 426 layers, 31919574 parameters, 0 gradients, 157.8 GFLOPsClass Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:03<00:00, 4.all 128 929 0.829 0.722 0.824 0.66person 128 254 0.959 0.646 0.858 0.666bicycle 128 6 0.847 0.5 0.687 0.529car 128 46 1 0.367 0.653 0.336motorcycle 128 5 0.916 1 0.995 0.831airplane 128 6 0.95 1 0.995 0.921bus 128 7 0.93 0.714 0.857 0.753train 128 3 0.896 1 0.995 0.93truck 128 12 0.927 0.5 0.715 0.431boat 128 6 0.648 0.333 0.571 0.47traffic light 128 14 0.958 0.429 0.47 0.275stop sign 128 2 0.872 1 0.995 0.946bench 128 9 1 0.633 0.94 0.724bird 128 16 0.985 1 0.995 0.711cat 128 4 0.908 1 0.995 0.946dog 128 9 1 0.876 0.995 0.884horse 128 2 0.778 1 0.995 0.754elephant 128 17 0.883 0.941 0.944 0.815bear 128 1 0.761 1 0.995 0.895zebra 128 4 0.919 1 0.995 0.943giraffe 128 9 0.921 1 0.995 0.858backpack 128 6 0.914 0.5 0.64 0.468umbrella 128 18 0.815 0.833 0.896 0.669handbag 128 19 0.693 0.263 0.507 0.401tie 128 7 1 0.694 0.839 0.665suitcase 128 4 0.924 1 0.995 0.648frisbee 128 5 0.984 0.8 0.962 0.788skis 128 1 0.83 1 0.995 0.895snowboard 128 7 0.679 0.714 0.855 0.637sports ball 128 6 0.625 0.5 0.533 0.304kite 128 10 0.788 0.376 0.582 0.165baseball bat 128 4 0.948 1 0.995 0.663baseball glove 128 7 1 0.407 0.44 0.31skateboard 128 5 0.588 0.6 0.646 0.53tennis racket 128 7 1 0.667 0.721 0.587bottle 128 18 0.761 0.556 0.694 0.45wine glass 128 16 0.643 0.812 0.788 0.538cup 128 36 0.849 0.782 0.862 0.612fork 128 6 0.585 0.333 0.75 0.589knife 128 16 0.673 0.75 0.79 0.58spoon 128 22 0.864 0.682 0.751 0.62bowl 128 28 0.825 0.786 0.812 0.732banana 128 1 0.782 1 0.995 0.995sandwich 128 2 0.639 1 0.995 0.995orange 128 4 0.934 1 0.995 0.765broccoli 128 11 0.766 0.302 0.531 0.375carrot 128 24 0.768 0.828 0.844 0.612hot dog 128 2 0.641 1 0.995 0.995pizza 128 5 0.826 0.954 0.962 0.874donut 128 14 0.664 1 0.972 0.901cake 128 4 0.9 1 0.995 0.904chair 128 35 0.721 0.514 0.751 0.547couch 128 6 0.805 0.69 0.839 0.697potted plant 128 14 1 0.623 0.868 0.672bed 128 3 0.671 1 0.995 0.929dining table 128 13 0.812 0.385 0.689 0.585toilet 128 2 0.416 0.5 0.497 0.45tv 128 2 0.852 1 0.995 0.895laptop 128 3 0.758 0.667 0.723 0.68mouse 128 2 1 0 0.497 0.204remote 128 8 0.921 0.5 0.69 0.625cell phone 128 8 1 0.462 0.614 0.426microwave 128 3 0.827 1 0.995 0.897oven 128 5 0.437 0.4 0.4 0.251sink 128 6 1 0.422 0.623 0.452refrigerator 128 5 0.529 1 0.92 0.787book 128 29 0.738 0.293 0.583 0.335clock 128 9 0.937 0.889 0.975 0.82vase 128 2 0.727 1 0.995 0.995scissors 128 1 1 0 0.995 0.199teddy bear 128 21 0.865 0.857 0.913 0.66toothbrush 128 5 0.855 1 0.995 0.856
Speed: 0.2ms preprocess, 26.0ms inference, 0.0ms loss, 0.8ms postprocess per imageDownloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
100%|███████████████████████████████████████████████████████████████████████████████████████████| 476k/476k [00:00<00:00, 650kB/s]
image 1/1 D:\yolo_seq\ultralytics-main\bus.jpg: 640x480 5 persons, 1 bus, 143.0ms
Speed: 1.0ms preprocess, 143.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 480)
Ultralytics YOLOv8.2.0 🚀 Python-3.8.16 torch-2.1.1+cu121 CPU (12th Gen Intel Core(TM) i7-12700H)PyTorch: starting from 'runs\detect\train\weights\best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (61.8 MB)ONNX: starting export with onnx 1.13.1 opset 17...
ONNX: export success ✅ 104.5s, saved as 'runs\detect\train\weights\best.onnx' (124.5 MB)Export complete (107.6s)
Results saved to D:\yolo_seq\ultralytics-main\runs\detect\train\weights
Predict: yolo predict task=detect model=runs\detect\train\weights\best.onnx imgsz=640
Validate: yolo val task=detect model=runs\detect\train\weights\best.onnx imgsz=640 data=D:\yolo_seq\ultralytics-main\ultralytics\cfg\datasets\coco128.yaml
Visualize: https://netron.app