YOLOV8-源码解读-SPP-SPPF

先给出YOLOV8中一键三连卷积模块


def autopad(k, p=None, d=1):  # kernel, padding, dilation"""Pad to 'same' shape outputs."""if d > 1:k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-sizeif p is None:p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-padreturn pclass Conv(nn.Module):"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""default_act = nn.SiLU()  # default activationdef __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):"""Initialize Conv layer with given arguments including activation."""super().__init__()self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)self.bn = nn.BatchNorm2d(c2)self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()def forward(self, x):"""Apply convolution, batch normalization and activation to input tensor."""return self.act(self.bn(self.conv(x)))

1、SPP

class SPP(nn.Module):"""Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729."""def __init__(self, c1, c2, k=(5, 9, 13)):"""Initialize the SPP layer with input/output channels and pooling kernel sizes."""super().__init__()c_ = c1 // 2  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)#卷积通道数减半self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)#输出那里用卷积通道数调整self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])#池化def forward(self, x):"""Forward pass of the SPP layer, performing spatial pyramid pooling."""x = self.cv1(x)#通道数减半x = torch.cat([x] + [m(x) for m in self.m], 1)#不同规格的池化，进行拼接return self.cv2(x)#调整通道数输出

图解：

1.1SPP的步骤：

1、特征图通道数减半

2、使用不同大小的池化窗口进行池化

3、将其不同结果拼接

4、对输出结果进行1×1卷积通道数调整

1.2SPP源码解读：

其中可能看不懂的函数可能就只有2句

self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
x = torch.cat([x] + [m(x) for m in self.m], 1)

ModuleList

这个ModuleList，他和sequential还有点区别，Sequential是一整个模块，一旦调用就要把里面的网络结构全走一遍。这个ModuleList更像是一个数组，只不过里面的元素对应的是网络结构，你可以随意选出每一层调用，比如代码中遍历整个ModuleList，每次只用里面的一个。

cat

还有一个就是cat，这个就是拼接了，后面的那个是维度，TensorFlow中数据同时是（Batch，H,W，channel）这种，而pytorch中则是（Batch，channel，H,W），拼接的话是把每一个特征图沿着通道维度拼接，因为是pytorch，通道维度在数据中的位置是1，在TensorFlow中用3或者-1

2、SPPF

class SPPF(nn.Module):"""Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher."""def __init__(self, c1, c2, k=5):"""Initializes the SPPF layer with given input/output channels and kernel size.This module is equivalent to SPP(k=(5, 9, 13))."""super().__init__()c_ = c1 // 2  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c_ * 4, c2, 1, 1)self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)def forward(self, x):"""Forward pass through Ghost Convolution block."""x = self.cv1(x)#降低通道数print(x.size())y1 = self.m(x)#池化print(y1.size())y2 = self.m(y1)#池化print(y2.size())return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))#拼接

图解：