MLCA注意力机制
简要:注意力机制是计算机视觉中使用最广泛的组件之一,可以帮助神经网络强调重要元素并抑制不相关的元素。然而,绝大多数信道注意力机制仅包含信道特征信息而忽略了空间特征信息,导致模型表示效果或目标检测性能较差,空间注意力模块往往复杂且成本高昂。为了在性能和复杂度之间取得平衡,该文提出一种轻量级的混合本地信道注意(MLCA)模块来提升目标检测网络的性能,该模块可以同时结合信道信息和空间信息,以及局部信息和全局信息来提高网络的表达效果。在此基础上,提出了用于比较各种注意力模块性能的MobileNet-Attention-YOLO(MAY)算法。在 Pascal VOC 和 SMID 数据集上,MLCA 在模型表示的功效、性能和复杂性之间实现了比其他注意力技术更好的平衡。与PASCAL VOC数据集上的Squeeze-and-Excitation(SE)注意力机制和SIMD数据集上的Coordinate Attention(CA)方法相比,mAP分别提高了1.0%和1.5%。
原文地址:Mixed local channel attention for object detection
pytorch代码实现MLCA
import math, torch
from torch import nn
import torch.nn.functional as Fclass MLCA(nn.Module):def __init__(self, in_size, local_size=5, gamma = 2, b = 1,local_weight=0.5):super(MLCA, self).__init__()# ECA 计算方法self.local_size=local_sizeself.gamma = gammaself.b = bt = int(abs(math.log(in_size, 2) + self.b) / self.gamma) # eca gamma=2k = t if t % 2 else t + 1self.conv = nn.Conv1d(1, 1, kernel_size=k, padding=(k - 1) // 2, bias=False)self.conv_local = nn.Conv1d(1, 1, kernel_size=k, padding=(k - 1) // 2, bias=False)self.local_weight=local_weightself.local_arv_pool = nn.AdaptiveAvgPool2d(local_size)self.global_arv_pool=nn.AdaptiveAvgPool2d(1)def forward(self, x):local_arv=self.local_arv_pool(x)global_arv=self.global_arv_pool(local_arv)b,c,m,n = x.shapeb_local, c_local, m_local, n_local = local_arv.shape# (b,c,local_size,local_size) -> (b,c,local_size*local_size) -> (b,local_size*local_size,c) -> (b,1,local_size*local_size*c)temp_local= local_arv.view(b, c_local, -1).transpose(-1, -2).reshape(b, 1, -1)# (b,c,1,1) -> (b,c,1) -> (b,1,c)temp_global = global_arv.view(b, c, -1).transpose(-1, -2)y_local = self.conv_local(temp_local)y_global = self.conv(temp_global)# (b,c,local_size,local_size) <- (b,c,local_size*local_size)<-(b,local_size*local_size,c) <- (b,1,local_size*local_size*c)y_local_transpose=y_local.reshape(b, self.local_size * self.local_size,c).transpose(-1,-2).view(b, c, self.local_size , self.local_size)# (b,1,c) -> (b,c,1) -> (b,c,1,1)y_global_transpose = y_global.transpose(-1,-2).unsqueeze(-1)# 反池化att_local = y_local_transpose.sigmoid()att_global = F.adaptive_avg_pool2d(y_global_transpose.sigmoid(),[self.local_size, self.local_size])att_all = F.adaptive_avg_pool2d(att_global*(1-self.local_weight)+(att_local*self.local_weight), [m, n])x = x * att_allreturn xif __name__ == '__main__':attention = MLCA(in_size=256)inputs = torch.randn((2, 256, 16, 16))result = attention(inputs)print(result.size())