YOLOv11融合CVPR[2024]自适应扩张卷积FADC模块及相关改进思路|YOLO改进最简教程


YOLOv11v10v8使用教程:  YOLOv11入门到入土使用教程

YOLOv11改进汇总贴:YOLOv11及自研模型更新汇总 


《Frequency-Adaptive Dilated Convolution for Semantic Segmentation》

一、 模块介绍

        论文链接:https://arxiv.org/abs/2403.05369

        代码链接:https://github.com/Linwei-Chen/FADC

论文速览:

         扩张卷积通过在连续元素之间插入间隙来扩大感受野,广泛用于计算机视觉。在这项研究中,从谱分析的角度提出了三种策略来改进扩张卷积的各个阶段。与将全局膨胀率固定为超参数的传统做法不同,引入了频率自适应膨胀卷积 (FADC),它根据局部频率分量在空间上动态调整膨胀率。随后,设计了两个插件模块,以直接提高有效带宽和感受野大小。Adaptive Kernel (AdaKern) 模块将卷积权重分解为低频和高频分量,并按通道动态调整这些分量之间的比率。通过增加卷积权重的高频部分,AdaKern 捕获了更多的高频分量,从而提高了有效带宽。频率选择 (FreqSelect) 模块通过空间变化重新加权来优化平衡特征表示中的高频和低频分量。它抑制了背景中的高频,以鼓励 FADC 学习更大的膨胀,从而增加感受野以扩大范围。关于分割和对象检测的广泛实验始终验证了我们方法的有效性。

总结:文章提出一种针对语义分割的自适应扩张卷积,可用于目标检测。


二、 加入到YOLO中

2.1 创建脚本文件

        首先在ultralytics->nn路径下创建blocks.py脚本,用于存放模块代码。

2.2 复制代码        

        复制代码粘到刚刚创建的blocks.py脚本中,如下图所示:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_dct as dctfrom mmcv.ops.modulated_deform_conv import ModulatedDeformConv2d, modulated_deform_conv2d# -------------------------AdaptiveDilatedConv----------------------------------
class OmniAttention(nn.Module):"""For adaptive kernel, AdaKern"""def __init__(self, in_planes, out_planes, kernel_size, groups=1, reduction=0.0625, kernel_num=4, min_channel=16):super(OmniAttention, self).__init__()attention_channel = max(int(in_planes * reduction), min_channel)self.kernel_size = kernel_sizeself.kernel_num = kernel_numself.temperature = 1.0self.avgpool = nn.AdaptiveAvgPool2d(1)self.fc = nn.Conv2d(in_planes, attention_channel, 1, bias=False)# self.bn = nn.BatchNorm2d(attention_channel)self.relu = nn.ReLU(inplace=True)self.channel_fc = nn.Conv2d(attention_channel, in_planes, 1, bias=True)self.func_channel = self.get_channel_attentionif in_planes == groups and in_planes == out_planes:  # depth-wise convolutionself.func_filter = self.skipelse:self.filter_fc = nn.Conv2d(attention_channel, out_planes, 1, bias=True)self.func_filter = self.get_filter_attentionif kernel_size == 1:  # point-wise convolutionself.func_spatial = self.skipelse:self.spatial_fc = nn.Conv2d(attention_channel, kernel_size * kernel_size, 1, bias=True)self.func_spatial = self.get_spatial_attentionif kernel_num == 1:self.func_kernel = self.skipelse:self.kernel_fc = nn.Conv2d(attention_channel, kernel_num, 1, bias=True)self.func_kernel = self.get_kernel_attentionself._initialize_weights()def _initialize_weights(self):for m in self.modules():if isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')if m.bias is not None:nn.init.constant_(m.bias, 0)if isinstance(m, nn.BatchNorm2d):nn.init.constant_(m.weight, 1)nn.init.constant_(m.bias, 0)def update_temperature(self, temperature):self.temperature = temperature@staticmethoddef skip(_):return 1.0def get_channel_attention(self, x):self.channel_fc.to(x.dtype)channel_attention = torch.sigmoid(self.channel_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)return channel_attentiondef get_filter_attention(self, x):self.filter_fc.to(x.dtype)filter_attention = torch.sigmoid(self.filter_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)return filter_attentiondef get_spatial_attention(self, x):self.spatial_fc.to(x.dtype)spatial_attention = self.spatial_fc(x).view(x.size(0), 1, 1, 1, self.kernel_size, self.kernel_size)spatial_attention = torch.sigmoid(spatial_attention / self.temperature)return spatial_attentiondef get_kernel_attention(self, x):self.kernel_fc.to(x.dtype)kernel_attention = self.kernel_fc(x).view(x.size(0), -1, 1, 1, 1, 1)kernel_attention = F.softmax(kernel_attention / self.temperature, dim=1)return kernel_attentiondef forward(self, x):self.fc.to(x.dtype)x = self.avgpool(x)x = self.fc(x)# x = self.bn(x)x = self.relu(x)return self.func_channel(x), self.func_filter(x), self.func_spatial(x), self.func_kernel(x)def generate_laplacian_pyramid(input_tensor, num_levels, size_align=True, mode='bilinear'):""""a alternative way for feature frequency decompose"""pyramid = []current_tensor = input_tensor_, _, H, W = current_tensor.shapefor _ in range(num_levels):b, _, h, w = current_tensor.shapedownsampled_tensor = F.interpolate(current_tensor, (h // 2 + h % 2, w // 2 + w % 2), mode=mode,align_corners=(H % 2) == 1)  # antialias=Trueif size_align:# upsampled_tensor = F.interpolate(downsampled_tensor, (h, w), mode='bilinear', align_corners=(H%2) == 1)# laplacian = current_tensor - upsampled_tensor# laplacian = F.interpolate(laplacian, (H, W), mode='bilinear', align_corners=(H%2) == 1)upsampled_tensor = F.interpolate(downsampled_tensor, (H, W), mode=mode, align_corners=(H % 2) == 1)laplacian = F.interpolate(current_tensor, (H, W), mode=mode, align_corners=(H % 2) == 1) - upsampled_tensor# print(laplacian.shape)else:upsampled_tensor = F.interpolate(downsampled_tensor, (h, w), mode=mode, align_corners=(H % 2) == 1)laplacian = current_tensor - upsampled_tensorpyramid.append(laplacian)current_tensor = downsampled_tensorif size_align: current_tensor = F.interpolate(current_tensor, (H, W), mode=mode, align_corners=(H % 2) == 1)pyramid.append(current_tensor)return pyramidclass FrequencySelection(nn.Module):def __init__(self,in_channels,k_list=[2],# freq_list=[2, 3, 5, 7, 9, 11],lowfreq_att=True,fs_feat='feat',lp_type='freq',act='sigmoid',spatial='conv',spatial_group=1,spatial_kernel=3,init='zero',global_selection=False,):super().__init__()# k_list.sort()# print()self.k_list = k_list# self.freq_list = freq_listself.lp_list = nn.ModuleList()self.freq_weight_conv_list = nn.ModuleList()self.fs_feat = fs_featself.lp_type = lp_typeself.in_channels = in_channels# self.residual = residualif spatial_group > 64: spatial_group = in_channelsself.spatial_group = spatial_groupself.lowfreq_att = lowfreq_attif spatial == 'conv':self.freq_weight_conv_list = nn.ModuleList()_n = len(k_list)if lowfreq_att:  _n += 1for i in range(_n):freq_weight_conv = nn.Conv2d(in_channels=in_channels,out_channels=self.spatial_group,stride=1,kernel_size=spatial_kernel,groups=self.spatial_group,padding=spatial_kernel // 2,bias=True)if init == 'zero':freq_weight_conv.weight.data.zero_()freq_weight_conv.bias.data.zero_()else:# raise NotImplementedErrorpassself.freq_weight_conv_list.append(freq_weight_conv)else:raise NotImplementedErrorif self.lp_type == 'avgpool':for k in k_list:self.lp_list.append(nn.Sequential(nn.ReplicationPad2d(padding=k // 2),# nn.ZeroPad2d(padding= k // 2),nn.AvgPool2d(kernel_size=k, padding=0, stride=1)))elif self.lp_type == 'laplacian':passelif self.lp_type == 'freq':passelse:raise NotImplementedErrorself.act = act# self.freq_weight_conv_list.append(nn.Conv2d(self.deform_groups * 3 * self.kernel_size[0] * self.kernel_size[1], 1, kernel_size=1, padding=0, bias=True))self.global_selection = global_selectionif self.global_selection:self.global_selection_conv_real = nn.Conv2d(in_channels=in_channels,out_channels=self.spatial_group,stride=1,kernel_size=1,groups=self.spatial_group,padding=0,bias=True)self.global_selection_conv_imag = nn.Conv2d(in_channels=in_channels,out_channels=self.spatial_group,stride=1,kernel_size=1,groups=self.spatial_group,padding=0,bias=True)if init == 'zero':self.global_selection_conv_real.weight.data.zero_()self.global_selection_conv_real.bias.data.zero_()self.global_selection_conv_imag.weight.data.zero_()self.global_selection_conv_imag.bias.data.zero_()def sp_act(self, freq_weight):if self.act == 'sigmoid':freq_weight = freq_weight.sigmoid() * 2elif self.act == 'softmax':freq_weight = freq_weight.softmax(dim=1) * freq_weight.shape[1]else:raise NotImplementedErrorreturn freq_weightdef forward(self, x, att_feat=None):"""att_feat:feat for gen att"""# freq_weight = self.freq_weight_conv(x)# self.sp_act(freq_weight)# if self.residual: x_residual = x.clone()if att_feat is None: att_feat = xx_list = []if self.lp_type == 'avgpool':# for avg, freq_weight in zip(self.avg_list, self.freq_weight_conv_list):pre_x = xb, _, h, w = x.shapefor idx, avg in enumerate(self.lp_list):low_part = avg(x)high_part = pre_x - low_partpre_x = low_part# x_list.append(freq_weight[:, idx:idx+1] * high_part)freq_weight = self.freq_weight_conv_list[idx](att_feat)freq_weight = self.sp_act(freq_weight)# tmp = freq_weight[:, :, idx:idx+1] * high_part.reshape(b, self.spatial_group, -1, h, w)tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * high_part.reshape(b, self.spatial_group,-1, h, w)x_list.append(tmp.reshape(b, -1, h, w))if self.lowfreq_att:freq_weight = self.freq_weight_conv_list[len(x_list)](att_feat)# tmp = freq_weight[:, :, len(x_list):len(x_list)+1] * pre_x.reshape(b, self.spatial_group, -1, h, w)tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * pre_x.reshape(b, self.spatial_group, -1, h,w)x_list.append(tmp.reshape(b, -1, h, w))else:x_list.append(pre_x)elif self.lp_type == 'laplacian':# for avg, freq_weight in zip(self.avg_list, self.freq_weight_conv_list):# pre_x = xb, _, h, w = x.shapepyramids = generate_laplacian_pyramid(x, len(self.k_list), size_align=True)# print('pyramids', len(pyramids))for idx, avg in enumerate(self.k_list):# print(idx)high_part = pyramids[idx]freq_weight = self.freq_weight_conv_list[idx](att_feat)freq_weight = self.sp_act(freq_weight)# tmp = freq_weight[:, :, idx:idx+1] * high_part.reshape(b, self.spatial_group, -1, h, w)tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * high_part.reshape(b, self.spatial_group,-1, h, w)x_list.append(tmp.reshape(b, -1, h, w))if self.lowfreq_att:freq_weight = self.freq_weight_conv_list[len(x_list)](att_feat)# tmp = freq_weight[:, :, len(x_list):len(x_list)+1] * pre_x.reshape(b, self.spatial_group, -1, h, w)tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * pyramids[-1].reshape(b, self.spatial_group,-1, h, w)x_list.append(tmp.reshape(b, -1, h, w))else:x_list.append(pyramids[-1])elif self.lp_type == 'freq':pre_x = x.clone()b, _, h, w = x.shape# b, _c, h, w = freq_weight.shape# freq_weight = freq_weight.reshape(b, self.spatial_group, -1, h, w)x_fft = torch.fft.fftshift(torch.fft.fft2(x.double(), norm='ortho'))# x_fft.to(torch.float32)if self.global_selection:# global_att_real = self.global_selection_conv_real(x_fft.real)# global_att_real = self.sp_act(global_att_real).reshape(b, self.spatial_group, -1, h, w)# global_att_imag = self.global_selection_conv_imag(x_fft.imag)# global_att_imag = self.sp_act(global_att_imag).reshape(b, self.spatial_group, -1, h, w)# x_fft = x_fft.reshape(b, self.spatial_group, -1, h, w)# x_fft.real *= global_att_real# x_fft.imag *= global_att_imag# x_fft = x_fft.reshape(b, -1, h, w)# 将x_fft复数拆分成实部和虚部x_real = x_fft.realx_imag = x_fft.imag# 计算实部的全局注意力global_att_real = self.global_selection_conv_real(x_real)global_att_real = self.sp_act(global_att_real).reshape(b, self.spatial_group, -1, h, w)# 计算虚部的全局注意力global_att_imag = self.global_selection_conv_imag(x_imag)global_att_imag = self.sp_act(global_att_imag).reshape(b, self.spatial_group, -1, h, w)# 重塑x_fft为形状为(b, self.spatial_group, -1, h, w)的张量x_real = x_real.reshape(b, self.spatial_group, -1, h, w)x_imag = x_imag.reshape(b, self.spatial_group, -1, h, w)# 分别应用实部和虚部的全局注意力x_fft_real_updated = x_real * global_att_realx_fft_imag_updated = x_imag * global_att_imag# 合并为复数x_fft_updated = torch.complex(x_fft_real_updated, x_fft_imag_updated)# 重塑x_fft为形状为(b, -1, h, w)的张量x_fft = x_fft_updated.reshape(b, -1, h, w)for idx, freq in enumerate(self.k_list):mask = torch.zeros_like(x[:, 0:1, :, :], device=x.device)mask[:, :, round(h / 2 - h / (2 * freq)):round(h / 2 + h / (2 * freq)),round(w / 2 - w / (2 * freq)):round(w / 2 + w / (2 * freq))] = 1.0low_part = torch.fft.ifft2(torch.fft.ifftshift(x_fft * mask), norm='ortho').realhigh_part = pre_x - low_partpre_x = low_partfreq_weight = self.freq_weight_conv_list[idx](att_feat)freq_weight = self.sp_act(freq_weight)# tmp = freq_weight[:, :, idx:idx+1] * high_part.reshape(b, self.spatial_group, -1, h, w)tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * high_part.reshape(b, self.spatial_group,-1, h, w)x_list.append(tmp.reshape(b, -1, h, w))if self.lowfreq_att:freq_weight = self.freq_weight_conv_list[len(x_list)](att_feat)# tmp = freq_weight[:, :, len(x_list):len(x_list)+1] * pre_x.reshape(b, self.spatial_group, -1, h, w)tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * pre_x.reshape(b, self.spatial_group, -1, h,w)x_list.append(tmp.reshape(b, -1, h, w))else:x_list.append(pre_x)x = sum(x_list)return xclass AdaptiveDilatedConv(ModulatedDeformConv2d):"""A ModulatedDeformable Conv Encapsulation that acts as normal Convlayers.Args:in_channels (int): Same as nn.Conv2d.out_channels (int): Same as nn.Conv2d.kernel_size (int or tuple[int]): Same as nn.Conv2d.stride (int): Same as nn.Conv2d, while tuple is not supported.padding (int): Same as nn.Conv2d, while tuple is not supported.dilation (int): Same as nn.Conv2d, while tuple is not supported.groups (int): Same as nn.Conv2d.bias (bool or str): If specified as `auto`, it will be decided by thenorm_cfg. Bias will be set as True if norm_cfg is None, otherwiseFalse."""_version = 2def __init__(self, *args,offset_freq=None,  # deprecatedpadding_mode='repeat',kernel_decompose='both',conv_type='conv',sp_att=False,pre_fs=True,  # False, use dilationepsilon=1e-4,use_zero_dilation=False,use_dct=False,fs_cfg={'k_list': [2, 4, 8],'fs_feat': 'feat','lowfreq_att': False,'lp_type': 'freq',# 'lp_type':'laplacian','act': 'sigmoid','spatial': 'conv','spatial_group': 1,},**kwargs):super().__init__(*args, **kwargs)if padding_mode == 'zero':self.PAD = nn.ZeroPad2d(self.kernel_size[0] // 2)elif padding_mode == 'repeat':self.PAD = nn.ReplicationPad2d(self.kernel_size[0] // 2)else:self.PAD = nn.Identity()self.kernel_decompose = kernel_decomposeself.use_dct = use_dctif kernel_decompose == 'both':self.OMNI_ATT1 = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels, kernel_size=1,groups=1, reduction=0.0625, kernel_num=1, min_channel=16)self.OMNI_ATT2 = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels,kernel_size=self.kernel_size[0] if self.use_dct else 1, groups=1,reduction=0.0625, kernel_num=1, min_channel=16)elif kernel_decompose == 'high':self.OMNI_ATT = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels, kernel_size=1,groups=1, reduction=0.0625, kernel_num=1, min_channel=16)elif kernel_decompose == 'low':self.OMNI_ATT = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels, kernel_size=1,groups=1, reduction=0.0625, kernel_num=1, min_channel=16)self.conv_type = conv_typeif conv_type == 'conv':self.conv_offset = nn.Conv2d(self.in_channels,self.deform_groups * 1,kernel_size=self.kernel_size,stride=self.stride,padding=self.kernel_size[0] // 2 if isinstance(self.PAD, nn.Identity) else 0,dilation=1,bias=True)self.conv_mask = nn.Conv2d(self.in_channels,self.deform_groups * 1 * self.kernel_size[0] * self.kernel_size[1],kernel_size=self.kernel_size,stride=self.stride,padding=self.kernel_size[0] // 2 if isinstance(self.PAD, nn.Identity) else 0,dilation=1,bias=True)if sp_att:self.conv_mask_mean_level = nn.Conv2d(self.in_channels,self.deform_groups * 1,kernel_size=self.kernel_size,stride=self.stride,padding=self.kernel_size[0] // 2 if isinstance(self.PAD, nn.Identity) else 0,dilation=1,bias=True)self.offset_freq = offset_freqif self.offset_freq in ('FLC_high', 'FLC_res'):self.LP = FLC_Pooling(freq_thres=min(0.5 * 1 / self.dilation[0], 0.25))elif self.offset_freq in ('SLP_high', 'SLP_res'):self.LP = StaticLP(self.in_channels, kernel_size=3, stride=1, padding=1, alpha=8)elif self.offset_freq is None:passelse:raise NotImplementedError# An offset is like [y0, x0, y1, x1, y2, x2, ⋯, y8, x8]offset = [-1, -1, -1, 0, -1, 1,0, -1, 0, 0, 0, 1,1, -1, 1, 0, 1, 1]offset = torch.Tensor(offset)# offset[0::2] *= self.dilation[0]# offset[1::2] *= self.dilation[1]# a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimensionself.register_buffer('dilated_offset', torch.Tensor(offset[None, None, ..., None, None]))  # B, G, 18, 1, 1if fs_cfg is not None:if pre_fs:self.FS = FrequencySelection(self.in_channels, **fs_cfg)else:self.FS = FrequencySelection(1, **fs_cfg)  # use dilationself.pre_fs = pre_fsself.epsilon = epsilonself.use_zero_dilation = use_zero_dilationself.init_weights()def freq_select(self, x):if self.offset_freq is None:res = xelif self.offset_freq in ('FLC_high', 'SLP_high'):res = x - self.LP(x)elif self.offset_freq in ('FLC_res', 'SLP_res'):res = 2 * x - self.LP(x)else:raise NotImplementedErrorreturn resdef init_weights(self):super().init_weights()if hasattr(self, 'conv_offset'):# if isinstanace(self.conv_offset, nn.Conv2d):if self.conv_type == 'conv':self.conv_offset.weight.data.zero_()# self.conv_offset.bias.data.fill_((self.dilation[0] - 1) / self.dilation[0] + 1e-4)self.conv_offset.bias.data.fill_((self.dilation[0] - 1) / self.dilation[0] + self.epsilon)# self.conv_offset.bias.data.zero_()# if hasattr(self, 'conv_offset'):# self.conv_offset_low[1].weight.data.zero_()# if hasattr(self, 'conv_offset_high'):# self.conv_offset_high[1].weight.data.zero_()# self.conv_offset_high[1].bias.data.zero_()if hasattr(self, 'conv_mask'):self.conv_mask.weight.data.zero_()self.conv_mask.bias.data.zero_()if hasattr(self, 'conv_mask_mean_level'):self.conv_mask.weight.data.zero_()self.conv_mask.bias.data.zero_()# @force_fp32(apply_to=('x',))# @force_fp32def forward(self, x):x_type = x.dtype# offset = self.conv_offset(self.freq_select(x)) + self.conv_offset_low(self.freq_select(x))if hasattr(self, 'FS') and self.pre_fs: x = self.FS(x)# x = x.to(torch.float32)if hasattr(self, 'OMNI_ATT1') and hasattr(self, 'OMNI_ATT2'):c_att1, f_att1, _, _, = self.OMNI_ATT1(x)c_att2, f_att2, spatial_att2, _, = self.OMNI_ATT2(x)elif hasattr(self, 'OMNI_ATT'):c_att, f_att, _, _, = self.OMNI_ATT(x)if self.conv_type == 'conv':self.conv_offset.to(x.dtype)offset = self.conv_offset(self.PAD(self.freq_select(x)))elif self.conv_type == 'multifreqband':self.conv_offset.to(x.dtype)offset = self.conv_offset(self.freq_select(x))# high_gate = self.conv_offset_high(x)# high_gate = torch.exp(-0.5 * high_gate ** 2)# offset = F.relu(offset, inplace=True) * self.dilation[0] - 1 # ensure > 0if self.use_zero_dilation:offset = (F.relu(offset + 1, inplace=True) - 1) * self.dilation[0]  # ensure > 0else:# offset = F.relu(offset, inplace=True) * self.dilation[0] # ensure > 0offset = offset.abs() * self.dilation[0]  # ensure > 0# offset[offset<0] = offset[offset<0].exp() - 1# print(offset.mean(), offset.std(), offset.max(), offset.min())if hasattr(self, 'FS') and (self.pre_fs == False): x = self.FS(x, F.interpolate(offset, x.shape[-2:],mode='bilinear', align_corners=(x.shape[-1] % 2) == 1))# print(offset.max(), offset.abs().min(), offset.abs().mean())# offset *= high_gate # ensure > 0b, _, h, w = offset.shapeoffset = offset.reshape(b, self.deform_groups, -1, h, w) * self.dilated_offset# offset = offset.reshape(b, self.deform_groups, -1, h, w).repeat(1, 1, 9, 1, 1)# offset[:, :, 0::2, ] *= self.dilated_offset[:, :, 0::2, ]# offset[:, :, 1::2, ] *= self.dilated_offset[:, :, 1::2, ]offset = offset.reshape(b, -1, h, w)x = self.PAD(x)self.conv_mask.to(x.dtype)mask = self.conv_mask(x)mask = mask.sigmoid()# print(mask.shape)# mask = mask.reshape(b, self.deform_groups, -1, h, w).softmax(dim=2)if hasattr(self, 'conv_mask_mean_level'):mask_mean_level = torch.sigmoid(self.conv_mask_mean_level(x)).reshape(b, self.deform_groups, -1, h, w)mask = mask * mask_mean_levelmask = mask.reshape(b, -1, h, w)if hasattr(self, 'OMNI_ATT1') and hasattr(self, 'OMNI_ATT2'):offset = offset.reshape(1, -1, h, w)mask = mask.reshape(1, -1, h, w)x = x.reshape(1, -1, x.size(-2), x.size(-1))adaptive_weight = self.weight.unsqueeze(0).repeat(b, 1, 1, 1, 1)  # b, c_out, c_in, k, kadaptive_weight_mean = adaptive_weight.mean(dim=(-1, -2), keepdim=True)adaptive_weight_res = adaptive_weight - adaptive_weight_mean_, c_out, c_in, k, k = adaptive_weight.shapeif self.use_dct:dct_coefficients = dct.dct_2d(adaptive_weight_res)# print(adaptive_weight_res.shape, dct_coefficients.shape)spatial_att2 = spatial_att2.reshape(b, 1, 1, k, k)dct_coefficients = dct_coefficients * (spatial_att2 * 2)# print(dct_coefficients.shape)adaptive_weight_res = dct.idct_2d(dct_coefficients)# adaptive_weight_res = adaptive_weight_res.reshape(b, c_out, c_in, k, k)# print(adaptive_weight_res.shape, dct_coefficients.shape)# adaptive_weight = adaptive_weight_mean * (2 * c_att.unsqueeze(1)) * (2 * f_att.unsqueeze(2)) + adaptive_weight - adaptive_weight_mean# adaptive_weight = adaptive_weight_mean * (c_att1.unsqueeze(1) * 2) * (f_att1.unsqueeze(2) * 2) + (adaptive_weight - adaptive_weight_mean) * (c_att2.unsqueeze(1) * 2) * (f_att2.unsqueeze(2) * 2)adaptive_weight = adaptive_weight_mean * (c_att1.unsqueeze(1) * 2) * (f_att1.unsqueeze(2) * 2) + adaptive_weight_res * (c_att2.unsqueeze(1) * 2) * (f_att2.unsqueeze(2) * 2)adaptive_weight = adaptive_weight.reshape(-1, self.in_channels // self.groups, 3, 3)if self.bias is not None:bias = self.bias.repeat(b)else:bias = self.bias# print(adaptive_weight.shape)# print(bias.shape)# print(x.shape)x = modulated_deform_conv2d(x, offset, mask, adaptive_weight, bias,self.stride,(self.kernel_size[0] // 2, self.kernel_size[1] // 2) if isinstance(self.PAD,nn.Identity) else (0, 0),  # padding(1, 1),  # dilationself.groups * b, self.deform_groups * b)elif hasattr(self, 'OMNI_ATT'):offset = offset.reshape(1, -1, h, w)mask = mask.reshape(1, -1, h, w)x = x.reshape(1, -1, x.size(-2), x.size(-1))adaptive_weight = self.weight.unsqueeze(0).repeat(b, 1, 1, 1, 1)  # b, c_out, c_in, k, kadaptive_weight_mean = adaptive_weight.mean(dim=(-1, -2), keepdim=True)# adaptive_weight = adaptive_weight_mean * (2 * c_att.unsqueeze(1)) * (2 * f_att.unsqueeze(2)) + adaptive_weight - adaptive_weight_meanif self.kernel_decompose == 'high':adaptive_weight = adaptive_weight_mean + (adaptive_weight - adaptive_weight_mean) * (c_att.unsqueeze(1) * 2) * (f_att.unsqueeze(2) * 2)elif self.kernel_decompose == 'low':adaptive_weight = adaptive_weight_mean * (c_att.unsqueeze(1) * 2) * (f_att.unsqueeze(2) * 2) + (adaptive_weight - adaptive_weight_mean)adaptive_weight = adaptive_weight.reshape(-1, self.in_channels // self.groups, 3, 3)# adaptive_bias = self.unsqueeze(0).repeat(b, 1, 1, 1, 1)# print(adaptive_weight.shape)# print(offset.shape)# print(mask.shape)# print(x.shape)x = modulated_deform_conv2d(x, offset, mask, adaptive_weight, self.bias,self.stride,(self.kernel_size[0] // 2, self.kernel_size[1] // 2) if isinstance(self.PAD,nn.Identity) else (0, 0),  # padding(1, 1),  # dilationself.groups * b, self.deform_groups * b)else:x = modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,self.stride,(self.kernel_size[0] // 2, self.kernel_size[1] // 2) if isinstance(self.PAD,nn.Identity) else (0, 0),  # padding(1, 1),  # dilationself.groups, self.deform_groups)# x = modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,#                                self.stride, self.padding,#                                self.dilation, self.groups,#                                self.deform_groups)# if hasattr(self, 'OMNI_ATT'): x = x * f_attreturn x.reshape(b, -1, h, w).to(x_type)

2.3 更改task.py文件 

       打开ultralytics->nn->modules->task.py,在脚本空白处导入函数。

from ultralytics.nn.blocks import *

        之后找到模型解析函数parse_model(约在tasks.py脚本中940行左右位置,可能因代码版本不同变动),在该函数的最后一个else分支上面增加相关解析代码。

        elif m is AdaptiveDilatedConv:c2 = args[0]args = [ch[f], *args]

2.4 更改yaml文件 

yam文件解读:YOLO系列 “.yaml“文件解读_yolo yaml文件-CSDN博客

       打开更改ultralytics/cfg/models/11路径下的YOLOv11.yaml文件,替换原有模块。(放在该位置仅能插入该模块,具体效果未知。博主精力有限,仅完成与其他模块二次创新融合的测试,结构图见文末,代码见群文件更新。)

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, AdaptiveDilatedConv, [512, 3]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA, [1024]] # 10# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 13], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 10], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


 2.5 修改train.py文件

       创建Train脚本用于训练。

from ultralytics.models import YOLO
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'if __name__ == '__main__':model = YOLO(model='ultralytics/cfg/models/11/yolo11.yaml')# model.load('yolov8n.pt')model.train(data='./data.yaml', epochs=2, batch=1, device='0', imgsz=640, workers=2, cache=False,amp=True, mosaic=False, project='runs/train', name='exp')

         在train.py脚本中填入修改好的yaml路径,运行即可训练,数据集创建教程见下方链接。

YOLOv11入门到入土使用教程(含结构图)_yolov11使用教程-CSDN博客

三、相关改进思路(2024/11/23日群文件)

        该模块可替换C2f、C3模块中的BottleNeck部分,代码见群文件,结构如图。自研模块与该模块融合代码及yaml文件见群文件。

 ⭐另外,融合上百种深度学习改进模块的YOLO项目仅79.9(含百种改进的v9),RTDETR79.9,含高性能自研模型,更易发论文,代码每周更新,欢迎点击下方小卡片加我了解。⭐

⭐⭐平均每个文章对应4-6个二创及自研融合模块⭐⭐


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/59347.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Python 数据文件读写教程

Python 数据文件读写教程 在数据科学和软件开发中&#xff0c;文件的读写操作是至关重要的。Python 作为一门功能强大的编程语言&#xff0c;提供了多种方式来处理文件&#xff0c;包括文本文件、CSV 文件、JSON 文件等。在本教程中&#xff0c;我们将深入探讨 Python 中的数据…

模拟实现优先级队列

目录 定义 特点 构造函数 常用方法 关于扩容的问题 关于建堆的问题 向上调整和向下调整的比较 &#xff08;向上调整&#xff09;代码 &#xff08;向下调整&#xff09;代码 关于入队列和出队列问题 模拟实现优先级队列代码 关于堆排序的问题 堆排序代码 关于对…

Autosar CP DDS规范导读

Autosar CP DDS 主要用途 数据通信 中间件协议&#xff1a;作为一种中间件协议&#xff0c;DDS实现了应用程序之间的高效数据通信&#xff0c;能够在不同的软件组件和ECU之间传输数据&#xff0c;确保数据的实时性和可靠性。跨平台通信&#xff1a;支持在AUTOSAR CP平台上的不同…

数字IC实践项目(10)—基于System Verilog的DDR4 Model/Tb 及基础Verification IP的设计与验证(付费项目)

数字IC实践项目&#xff08;10&#xff09;—基于System Verilog的DDR4 Model/Tb 及基础Verification IP的设计与验证&#xff08;付费项目&#xff09; 前言项目框图1&#xff09;DDR4 Verification IP2&#xff09;DDR4 JEDEC Model & Tb 项目文件1&#xff09;DDR4 Veri…

Jmeter中的监听器(三)

9--断言结果 功能特点 显示断言结果&#xff1a;列出所有断言的结果&#xff0c;包括通过和失败的断言。详细信息&#xff1a;显示每个断言的详细信息&#xff0c;如断言类型、实际结果和期望结果。错误信息&#xff1a;显示断言失败时的错误信息&#xff0c;帮助调试。颜色编…

人工智能大比拼(3)

已知x-,y-6&#xff0c;且下述表达式的值与x的取值无关&#xff0c;求y -10x2y7xy 上述这个很简单的数学题&#xff0c;可是在各家AI之间出现了争议&#xff0c;本期我使用了四个AI&#xff1a;kimi&#xff0c;商量&#xff0c;文心一言&#xff0c;chatyy 先来看一下kimi的表…

Xilinx Aurora 8B/10B IP介绍以及下板验证

文章目录 一、什么是Aurora协议&#xff1f;二、Aurora 8B/10B IP核的结构原理三、Aurora 8B/10B IP核 延迟开销四、用户数据接口格式4.1 AXI4-Stream 位排序4.2 帧传输用户端口说明4.3 帧传输数据流程4.4 Aurora 8B/10B 帧格式4.5 帧格式数据传输时序4.5.1 简单数据传输4.5.2 …

Redis8:商户查询缓存2

欢迎来到“雪碧聊技术”CSDN博客&#xff01; 在这里&#xff0c;您将踏入一个专注于Java开发技术的知识殿堂。无论您是Java编程的初学者&#xff0c;还是具有一定经验的开发者&#xff0c;相信我的博客都能为您提供宝贵的学习资源和实用技巧。作为您的技术向导&#xff0c;我将…

Camera Tuning中AE/AWB/AF基础知识介绍

3A定义 3A是Camera ISP控制算法的一个重要组成部分&#xff0c;通常分为自动曝光&#xff08;AE&#xff09;、自动聚焦&#xff08;AF&#xff09;、自动白平衡&#xff08;AWB&#xff09;三个组件。 自动曝光&#xff08;Auto Exposure&#xff09; AE基本概念 曝光概念…

数据库中的用户管理和权限管理

​ 我们进行数据库操作的地方其实是数据库的客户端&#xff0c;是我们在客户端将操作发送给数据库的服务器&#xff08;MySQL的服务器是mysqld&#xff09;&#xff0c;由数据库处理之后发送回来处理结果&#xff08;其实就是一种网络服务&#xff09;。所以可以存在多个客户端…

ubuntu24.04安装matlab失败

又是摸鱼摆烂的一天&#xff0c;好难过&#xff5e; 官方教程&#xff1a;https://ww2.mathworks.cn/help/install/ug/install-products-with-internet-connection.html 问题描述&#xff1a;https://ww2.mathworks.cn/matlabcentral/answers/2158925-cannot-install-matlab-r2…

Hive1.2.1与Hbase1.4.13集成---版本不兼容问题

hive与hbase集成失败,汗流夹背了吧老弟......哈哈哈哈,刷到这篇文章,那你可真是太幸运啦~ 常见错误一: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/h…

项目1:井字棋 --- 《跟着小王学Python》

项目1&#xff1a;井字棋 — 《跟着小王学Python新手》 文章目录 项目1&#xff1a;井字棋 --- 《跟着小王学Python新手》目标功能设计1. 数据结构2. 功能模块3. 用户界面 实现步骤代码实现测试注意事项小结 目标 本技术文章旨在指导读者如何使用 Python 编程语言来实现一个简…

Python 的 Pygame 库,编写简单的 Flappy Bird 游戏

Pygame 是一个用 Python 编写的开源游戏开发框架&#xff0c;专门用于编写 2D 游戏。它提供了丰富的工具和功能&#xff0c;使得开发者能够快速实现游戏中的图形渲染、声音播放、输入处理和动画效果等功能。Pygame 非常适合初学者和想要快速创建游戏原型的开发者。 Pygame 的主…

LeetCode-222.完全二叉树的节点个数

. - 力扣&#xff08;LeetCode&#xff09; 给你一棵 完全二叉树 的根节点 root &#xff0c;求出该树的节点个数。 完全二叉树 的定义如下&#xff1a;在完全二叉树中&#xff0c;除了最底层节点可能没填满外&#xff0c;其余每层节点数都达到最大值&#xff0c;并且最下面一…

【MongoDB】MongoDB的核心-索引原理及索引优化、及查询聚合优化实战案例(超详细)

文章目录 一、数据库查询效率问题引出索引需求二、索引的基本原理及作用&#xff08;一&#xff09;索引的创建及数据组织&#xff08;二&#xff09;不同类型的索引&#xff08;三&#xff09;索引的额外属性 三、索引的优化与查询计划分析&#xff08;一&#xff09;通过prof…

企业如何实现无缝数据中心进行大数据迁移呢?

数据中心迁移是企业面临的一个复杂而关键的挑战&#xff0c;涉及到大量的数据移动和系统的重新部署。随着业务的扩展和技术的进步&#xff0c;企业可能需要将数据中心迁移到新的位置或升级到更先进的设备。在进行数据迁移时&#xff0c;必须精心规划和执行&#xff0c;以确保数…

初始JavaEE篇 —— 网络编程(2):了解套接字,从0到1实现回显服务器

找往期文章包括但不限于本期文章中不懂的知识点&#xff1a; 个人主页&#xff1a;我要学编程程(ಥ_ಥ)-CSDN博客 所属专栏&#xff1a;JavaEE 目录 TCP 与 UDP Socket套接字 UDP TCP 网络基础知识 在一篇文章中&#xff0c;我们了解了基础的网络知识&#xff0c;网络的出…

【人工智能】10分钟解读-深入浅出大语言模型(LLM)——从ChatGPT到未来AI的演进

文章目录 一、前言二、GPT模型的发展历程2.1 自然语言处理的局限2.2 机器学习的崛起2.3 深度学习的兴起2.3.1 神经网络的训练2.3.2 神经网络面临的挑战 2.4 Transformer的革命性突破2.4.1 Transformer的核心组成2.4.2 Transformer的优势 2.5 GPT模型的诞生与发展2.5.1 GPT的核心…

最全最简单理解迭代器

1. 迭代器的基础概念(iterator) 1.1 本质 迭代器能够用来遍历容器的对象,与能够遍历数组的指针类似,是广义指针。 1.2 作用: 能够让迭代器与算法不干扰的相互发展,最后又能无间隙的粘合起来。重载了*,++,==,!=,=运算符。用以操作复杂的数据结构。容器提供迭代…