先给出YOLOV8中一键三连卷积模块
def autopad(k, p=None, d=1): # kernel, padding, dilation"""Pad to 'same' shape outputs."""if d > 1:k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-sizeif p is None:p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-padreturn pclass Conv(nn.Module):"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""default_act = nn.SiLU() # default activationdef __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):"""Initialize Conv layer with given arguments including activation."""super().__init__()self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)self.bn = nn.BatchNorm2d(c2)self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()def forward(self, x):"""Apply convolution, batch normalization and activation to input tensor."""return self.act(self.bn(self.conv(x)))
1、SPP
class SPP(nn.Module):"""Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729."""def __init__(self, c1, c2, k=(5, 9, 13)):"""Initialize the SPP layer with input/output channels and pooling kernel sizes."""super().__init__()c_ = c1 // 2 # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)#卷积通道数减半self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)#输出那里用卷积通道数调整self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])#池化def forward(self, x):"""Forward pass of the SPP layer, performing spatial pyramid pooling."""x = self.cv1(x)#通道数减半x = torch.cat([x] + [m(x) for m in self.m], 1)#不同规格的池化,进行拼接return self.cv2(x)#调整通道数输出
图解:
1.1SPP的步骤:
1、特征图通道数减半
2、使用不同大小的池化窗口进行池化
3、将其不同结果拼接
4、对输出结果进行1×1卷积通道数调整
1.2SPP源码解读:
其中可能看不懂的函数可能就只有2句
self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
x = torch.cat([x] + [m(x) for m in self.m], 1)
ModuleList
这个ModuleList,他和sequential还有点区别,Sequential是一整个模块,一旦调用就要把里面的网络结构全走一遍。这个ModuleList更像是一个数组,只不过里面的元素对应的是网络结构,你可以随意选出每一层调用,比如代码中遍历整个ModuleList,每次只用里面的一个。
cat
还有一个就是cat,这个就是拼接了,后面的那个是维度,TensorFlow中数据同时是(Batch,H,W,channel)这种,而pytorch中则是(Batch,channel,H,W),拼接的话是把每一个特征图沿着通道维度拼接,因为是pytorch,通道维度在数据中的位置是1,在TensorFlow中用3或者-1
2、SPPF
class SPPF(nn.Module):"""Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher."""def __init__(self, c1, c2, k=5):"""Initializes the SPPF layer with given input/output channels and kernel size.This module is equivalent to SPP(k=(5, 9, 13))."""super().__init__()c_ = c1 // 2 # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c_ * 4, c2, 1, 1)self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)def forward(self, x):"""Forward pass through Ghost Convolution block."""x = self.cv1(x)#降低通道数print(x.size())y1 = self.m(x)#池化print(y1.size())y2 = self.m(y1)#池化print(y2.size())return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))#拼接
图解:
这个和SPP的区别就是,他没有把池化放到不同大小池化窗口的池化层来用,而是只用一个5*5的池化窗口,进行了3次池化,然后进行拼接。
注意啊,这里的3次池化,池化窗口是相同的。
2.1SPPF的步骤:
1、1×1卷积调整通道数
2、进行3步池化,每一步都是在上一步的基础上。
3、将所有步骤中的特征图沿着通道维度拼接
4、1×1卷积调整通道数