文章目录
- 摘要
- 自研下采样模块及其变种
- 第一种改进方法
- YoloV9官方测试结果
- 改进方法
- 测试结果
- 总结
摘要
本文介绍我自研的下采样模块。本次改进的下采样模块是一种通用的改进方法,你可以用分类任务的主干网络中,也可以用在分割和超分的任务中。已经有粉丝用来改进ConvNext模型,取得了非常好的效果,配合一些其他的改进,发一篇CVPR、ECCV之类的顶会完全没有问题。
本次我将这个模块用来改进YoloV9,实现大幅度涨点。
自研下采样模块及其变种
第一种改进方法
将输入分成两个分支,一个分支用卷积,一个分支分成两部分,一部分用MaxPool,一部分用AvgPool。然后,在最后合并起来。代码如下:
import torch
import torch.nn as nndef autopad(k, p=None, d=1): # kernel, padding, dilation"""Pad to 'same' shape outputs."""if d > 1:k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-sizeif p is None:p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-padreturn pclass Conv(nn.Module):"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""default_act = nn.SiLU() # default activationdef __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):"""Initialize Conv layer with given arguments including activation."""super().__init__()self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)self.bn = nn.BatchNorm2d(c2)self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()def forward(self, x):"""Apply convolution, batch normalization and activation to input tensor."""return self.act(self.bn(self.conv(x)))def forward_fuse(self, x):"""Perform transposed convolution of 2D data."""return self.act(self.conv(x))class DownSimper(nn.Module):"""DownSimper."""def __init__(self, c1, c2):super().__init__()self.c = c2 // 2self.cv1 = Conv(c1, self.c, 3, 2, d=3)self.cv2 = Conv(c1, self.c, 1, 1, 0)def forward(self, x):x1 = self.cv1(x)x = self.cv2(x)x2, x3 = x.chunk(2, 1)x2 = torch.nn.functional.max_pool2d(x2, 3, 2, 1)x3 = torch.nn.functional.avg_pool2d(x3, 3, 2, 1)return torch.cat((x1, x2, x3), 1)
结构图:
左侧卷积中d=3,代表使用空洞卷积或者是膨胀卷积,可以带来更大的感受野。d=3,k=3等同卷积核为9.
YoloV9官方测试结果
yolov9 summary: 580 layers, 60567520 parameters, 0 gradients, 264.3 GFLOPsClass Images Instances P R mAP50 mAP50-95: 100%|██████████| 15/15 00:02all 230 1412 0.878 0.991 0.989 0.732c17 230 131 0.92 0.992 0.994 0.797c5 230 68 0.828 1 0.992 0.807helicopter 230 43 0.895 0.977 0.969 0.634c130 230 85 0.955 0.999 0.994 0.684f16 230 57 0.839 0.965 0.966 0.689b2 230 2 1 0.978 0.995 0.647other 230 86 0.91 0.942 0.957 0.525b52 230 70 0.917 0.971 0.979 0.806kc10 230 62 0.958 0.984 0.987 0.826command 230 40 0.964 1 0.995 0.815f15 230 123 0.939 0.995 0.995 0.702kc135 230 91 0.949 0.989 0.978 0.691a10 230 27 0.863 0.963 0.982 0.458b1 230 20 0.926 1 0.995 0.712aew 230 25 0.929 1 0.993 0.812f22 230 17 0.835 1 0.995 0.706p3 230 105 0.97 1 0.995 0.804p8 230 1 0.566 1 0.995 0.697f35 230 32 0.908 1 0.995 0.547f18 230 125 0.956 0.992 0.993 0.828v22 230 41 0.921 1 0.995 0.682su-27 230 31 0.925 1 0.994 0.832il-38 230 27 0.899 1 0.995 0.816tu-134 230 1 0.346 1 0.995 0.895su-33 230 2 0.96 1 0.995 0.747an-70 230 2 0.718 1 0.995 0.796tu-22 230 98 0.912 1 0.995 0.804
改进方法
将代码复制到common.py中,如下图:
在yolo.py中的parse_model函数中增加DownSimper,如下图:
代码:
elif m is DownSimper:c2 = args[0]c1 = ch[f]args = [c1, c2]
修改models/detect/yolov9.yaml配置文件,代码如下:
# YOLOv9 backbone
backbone:[[-1, 1, Silence, []], # conv down[-1, 1, DownSimper, [128]], # 1-P1/2# conv down[-1, 1, DownSimper, [256]], # 2-P2/4# elan-1 block[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 3# conv down[-1, 1, DownSimper, [512]], # 4-P3/8# elan-2 block[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 5# conv down[-1, 1, DownSimper, [512]], # 6-P4/16# elan-2 block[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 7# conv down[-1, 1, DownSimper, [512]], # 8-P5/32# elan-2 block[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 9]# YOLOv9 head
head:[# elan-spp block[-1, 1, SPPELAN, [512, 256]], # 10# up-concat merge[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 7], 1, Concat, [1]], # cat backbone P4# elan-2 block[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 13# up-concat merge[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 5], 1, Concat, [1]], # cat backbone P3# elan-2 block[-1, 1, RepNCSPELAN4, [256, 256, 128, 1]], # 16 (P3/8-small)# conv-down merge[-1, 1, Conv, [256, 3, 2]],[[-1, 13], 1, Concat, [1]], # cat head P4# elan-2 block[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 19 (P4/16-medium)# conv-down merge[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]], # cat head P5# elan-2 block[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 22 (P5/32-large)# routing[5, 1, CBLinear, [[256]]], # 23[7, 1, CBLinear, [[256, 512]]], # 24[9, 1, CBLinear, [[256, 512, 512]]], # 25# conv down[0, 1, Conv, [64, 3, 2]], # 26-P1/2# conv down[-1, 1, Conv, [128, 3, 2]], # 27-P2/4# elan-1 block[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 28# conv down fuse[-1, 1, Conv, [256, 3, 2]], # 29-P3/8[[23, 24, 25, -1], 1, CBFuse, [[0, 0, 0]]], # 30 # elan-2 block[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 31# conv down fuse[-1, 1, Conv, [512, 3, 2]], # 32-P4/16[[24, 25, -1], 1, CBFuse, [[1, 1]]], # 33 # elan-2 block[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 34# conv down fuse[-1, 1, Conv, [512, 3, 2]], # 35-P5/32[[25, -1], 1, CBFuse, [[2]]], # 36# elan-2 block[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 37# detect[[31, 34, 37, 16, 19, 22], 1, DualDDetect, [nc]], # DualDDetect(A3, A4, A5, P3, P4, P5)]
修改train_dual.py脚本中的超参数,代码如下:
parser.add_argument('--weights', type=str, default='', help='initial weights path')parser.add_argument('--cfg', type=str, default='models/detect/yolov9.yaml', help='model.yaml path')parser.add_argument('--data', type=str, default=ROOT / 'data/VOC.yaml', help='dataset.yaml path')parser.add_argument('--epochs', type=int, default=100, help='total training epochs')parser.add_argument('--batch-size', type=int, default=8, help='total batch size for all GPUs, -1 for autobatch')parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)')parser.add_argument('--workers', type=int, default=0, help='max dataloader workers (per RANK in DDP mode)')parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')parser.add_argument('--name', default='exp', help='save to project/name')parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
测试结果
yolov9 summary: 595 layers, 58708576 parameters, 0 gradients, 274.0 GFLOPsClass Images Instances P R mAP50 mAP50-95: 100%|██████████| 15/15 00:35all 230 1412 0.952 0.974 0.99 0.738c17 230 131 0.981 0.992 0.995 0.832c5 230 68 0.963 0.985 0.995 0.847helicopter 230 43 0.968 0.93 0.972 0.635c130 230 85 0.988 0.996 0.995 0.669f16 230 57 0.976 0.947 0.975 0.687b2 230 2 0.767 1 0.995 0.516other 230 86 0.981 0.907 0.968 0.573b52 230 70 0.969 0.971 0.985 0.812kc10 230 62 0.986 0.984 0.989 0.835command 230 40 0.988 1 0.995 0.82f15 230 123 0.965 0.992 0.989 0.697kc135 230 91 0.984 0.989 0.981 0.725a10 230 27 1 0.794 0.976 0.495b1 230 20 0.979 1 0.995 0.682aew 230 25 0.944 1 0.995 0.802f22 230 17 1 0.881 0.992 0.717p3 230 105 0.981 0.992 0.995 0.81p8 230 1 0.756 1 0.995 0.697f35 230 32 0.99 0.938 0.982 0.55f18 230 125 0.981 0.992 0.991 0.829v22 230 41 0.99 1 0.995 0.684su-27 230 31 0.985 1 0.995 0.849il-38 230 27 0.987 1 0.995 0.84tu-134 230 1 0.756 1 0.995 0.895su-33 230 2 0.99 1 0.995 0.747an-70 230 2 0.838 1 0.995 0.848tu-22 230 98 1 1 0.995 0.832
总结
本文自研下采样模块,实现YoloV9的涨点。欢迎大家在自己的数据集上做尝试。
代码:
链接:https://pan.baidu.com/s/1QFrpJuHOaDpKTEmBIIxdYw?pwd=cpvq
提取码:cpvq