**卷积神经网络典型CNN**

LeNet:最早用于数字识别的CNN

AlexNet:2012年ILSVRC比赛冠军,远超第二名的CNN,比LeNet更深,用多层小卷积叠加来替换单个的大卷积

ZF Net:2013ILSVRC冠军

GoogleNet:2014ILSVRC冠军

VGGNet:2014ILSVRC比赛中算法模型,效果率低于GoogleNet

ResNet:2015ILSVRC冠军,结构修正以适应更深层次的CNN训练

DenseNet:CVPR 2017最佳论文

卷积神经网络典型CNN-LeNet

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

import torch
import torch.nn as nnclass LeNet(nn.Module):def __init__(self):super(LeNet, self).__init__()self.features = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=20, kernel_size=(5, 5), stride=(1, 1), padding=0),nn.ReLU(),nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2)),nn.Conv2d(in_channels=20, out_channels=50, kernel_size=(5, 5), stride=(1, 1), padding=0),nn.ReLU(),nn.AdaptiveAvgPool2d(output_size=(4, 4)))self.classify = nn.Sequential(nn.Linear(800, 500),nn.ReLU(),nn.Linear(500, 10))def forward(self, x):z = self.features(x)z = z.view(-1, 800)z = self.classify(z)return zif __name__ == '__main__':net = LeNet()img = torch.randn(2, 1, 28, 28)score = net(img)print(score)probs  = torch.softmax(score, dim=1)print(probs)

LeNet-5

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

C1层是一个卷积层

6个特征图,每个特征图中的每个神经元与输入中55的邻域相连,特征图大小为2828

每个卷积神经元的参数数目:5*5=25个weight参数和一个bias参数

链接数目:(5*5+1)6(28**28)=122304个链接

参数共享:每个特征图内共享参数,因此参数总数:共(5*5+1)*6=156个参数

S2层是一个下采样层

6个1414的特征图,每个图中的每个单元与C1特征图中的一个22邻域相连接,不重叠。

和max pooling和average pooling不一样,在S2层中每个单元的4个输入相

加,乘以一个可训练参数w,再加上一个可训练偏置b,结果通过sigmoid函数计算得到最终池化之后的值。

连接数:(22+1)14146=5880个。

参数共享:每个特征图内共享参数,因此有2*6=12个可训练参数

C3层是一个卷积层

输入的feature map数量为6个,每个大小为1414;16个卷积核,得到16张特征图,特征图大小为1010。

每个特征图中的每个神经元与S2中某几层的多个5*5的邻域相连;

例如:对于C3层第0张特征图,其每一个节点与S2层的第0~2张特征图,总共3个5*5个节点相连接。

S4层是一个下采样层(和S2一样)

由16个55大小的特征图构成,特征图中的每个单元与C3中相应特征图的22邻域相连接。

连接数:(22+1)5516=2000个。

参数共享:特征图内共享参数,每个特征图中的每个神经元需要1个因子和一个偏置,因此有2*16个可训练参数。

C5层是一个卷积层

120个神经元,可以看作120个特征图,每张特征图的大小为1*1

每个单元与S4层的全部16个单元的5*5邻域相连(S4和C5之间的全连接)

连接数=可训练参数:(5516+1)*120=48120个

F6层是一个全连接层

有84个单元,与C5层全连接。

F6层计算输入向量和权重向量之间的点积,再加上一个偏置(wx+b),最后将加权值做一个sigmoid转换。

连接数=可训练参数:(120+1)*84=10164。

这里选择84作为神经元的数目从论文中可以认为是:ASCII字符标准的打印字符,是用712大小的位图,这里希望每一维特征分别体现标准712大小位图上每一个像素点的特性。

F7层是一个输出层

输出层是由欧式径向基函数(RBF)组成。每一个输出对应一个RBF函数,每一个RBF函数都有84维的输入向量,RBF的函数公式如下。每一个RBF函数都会有一个输出,最后输出层会输出一个10维的向量。

卷积神经网络典型CNN-AlexNet

在这里插入图片描述

在AlexNet引入了一种特殊的网络层次,即:Local Response Normalization(LRN,局部响应归一化),主要是对ReLU激活函数的输出进行局部归一化操作(和LN差不多)。

AlexNet结构优化

非线性激活函数:ReLU

使用Max Pooling,并且提出池化核和步长,使池化核之间存在重叠,提升了特征的丰富性。

防止过拟合的方法:Dropout,Data augmentation(数据增强)

大数据训练:百万级ImageNet图像数据

GPU实现:在每个GPU中放置一半核(或神经元),还有一个额外的技巧:GPU间的通讯只在某些层进行。

LRN归一化:对局部神经元的活动创建了竞争机制,使得其中响应比较大的值变得相对更大,

并抑制其它反馈较小的神经元,增强了模型的泛化能力。本质上,LRN是仿造生物学上活跃的神经元对于相邻神经元的抑制现象(侧抑制)。

import torch
import torch.nn as nnclass AlerxNet(nn.Module):def __init__(self, device1, device2):super(AlerxNet, self).__init__()self.device1 = device1self.device2 = device2self.feature11 = nn.Sequential(nn.Conv2d(in_channels=3, out_channels=48, kernel_size=(11, 11), stride=(4, 4), padding=2),nn.ReLU(),nn.LocalResponseNorm(size=5),nn.MaxPool2d(3, 2),nn.Conv2d(in_channels=48, out_channels=128, kernel_size=(5, 5), stride=(1, 1), padding=2),nn.ReLU(),nn.MaxPool2d(3, 2)).to(self.device1)self.feature21 = nn.Sequential(nn.Conv2d(in_channels=3, out_channels=48, kernel_size=(11, 11), stride=(4, 4), padding=2),nn.ReLU(),nn.LocalResponseNorm(size=5),nn.MaxPool2d(3, 2),nn.Conv2d(in_channels=48, out_channels=128, kernel_size=(5, 5), stride=(1, 1), padding=2),nn.ReLU(),nn.MaxPool2d(3, 2)).to(self.device2)self.feature12 = nn.Sequential(nn.Conv2d(in_channels=256, out_channels=192, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.Conv2d(in_channels=192, out_channels=192, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.Conv2d(in_channels=192, out_channels=128, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.MaxPool2d(3, 2)).to(self.device1)self.feature22 = nn.Sequential(nn.Conv2d(in_channels=384, out_channels=192, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.Conv2d(in_channels=192, out_channels=192, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.Conv2d(in_channels=192, out_channels=128, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.MaxPool2d(3, 2)).to(self.device2)self.classsify = nn.Sequential(nn.Linear(6 * 6 * 128 * 2, 4096),nn.ReLU(),nn.Linear(4096, 4096),nn.ReLU(),nn.Linear(4096, 1000))def forward(self, x):x1 = x.to(self.device1)x2 = x.to(self.device2)z1 = self.feature11(x1)z2 = self.feature21(x2)z1 = torch.concat([z1, z2.to(self.device1)], dim=1)z2 = torch.concat([z2, z1.to(self.device2)], dim=1)z1 = self.feature12(z1)z2 = self.feature22(z2)z = torch.concat([z1, z2.to(self.device1)], dim=1)z = z.view(-1, 6 * 6 * 128 * 2)z = self.classsify(z)return zif __name__ == '__main__':print(torch.cuda.is_available())device1 = torch.device('cpu')device2 = torch.device('cuda:0')net = AlerxNet(device1, device2)img = torch.randn(2, 3, 224, 224)score = net(img)print(score)

卷积神经网络典型CNN-ZF Net

ZF Net

基于AlexNet进行微调

修改窗口大小和步长

使用稠密单GPU的网络结构替换AlexNet的稀疏双GPU结构

Top5错误率11.2%

使用ReLU激活函数和交叉熵损失函数

在这里插入图片描述

import torch
import torch.nn as nnclass ZFNet(nn.Module):def  __init__(self):super(ZFNet, self).__init__()self.feature = nn.Sequential(nn.Conv2d(in_channels=3, out_channels=96, kernel_size=(7,7), stride=(2,2), padding=1),nn.ReLU(),nn.MaxPool2d(kernel_size=(3,3), stride=(2,2), padding=1),nn.LocalResponseNorm(size=30),nn.Conv2d(in_channels=96, out_channels=256, kernel_size=(5,5), stride=(2,2)),nn.ReLU(),nn.MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=1),nn.Conv2d(in_channels=256, out_channels=384, kernel_size=(3,3),stride=(1,1),padding=1),nn.ReLU(),nn.Conv2d(in_channels=384, out_channels=384, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.Conv2d(in_channels=384, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=1),nn.ReLU(),nn.MaxPool2d(kernel_size=(3, 3), stride=(2, 2)))self.classify = nn.Sequential(nn.Linear(6*6*256, 4096),nn.ReLU(),nn.Linear(4096, 4096),nn.ReLU(),nn.Linear(4096, 1000))def forward(self, x):z = self.feature(x)z = z.view(-1, 6*6*256)z = self.classify(z)return zif __name__ == '__main__':net = ZFNet()img = torch.randn(2, 3, 224, 224)score = net(img)print(score)probs   = torch.softmax(score, dim=1)print(probs)

卷积神经网络典型CNN-VGGNet

import torch
import torch.nn as nn
import torch.nn.functional as Fclass VggBlock(nn.Module):def __init__(self, in_channel, out_channel, n, use_11=False):super(VggBlock, self).__init__()layers = []kernel_size = (3, 3)for i in range(n):if use_11 and (i == n-1):kernel_size = (1, 1)conv = nn.Sequential(nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=(1,1), padding='same'),nn.ReLU())in_channel = out_channellayers.append(conv)layers.append(nn.MaxPool2d(2,2))self.block = nn.Sequential(*layers)def forward(self, x):return self.block(x)class VggNet(nn.Module):def __init__(self,features,  num_classes, classify_input_channel):super(VggNet, self).__init__()self.num_classes = num_classesself.features = featuresself.pooling = nn.AdaptiveAvgPool2d(output_size=(7, 7))self.classify = nn.Sequential(nn.Linear(in_features=7*7*classify_input_channel, out_features=4096),nn.ReLU(),nn.Linear(4096, 4096),nn.ReLU(),nn.Linear(4096, self.num_classes),)def forward(self, images):"""images:[N,3,W,H] 原始图像信息return:[N,num_classes] 预测类别置信度"""z = self.features(images) #[N,3,H,W] -> [N, classify_input_channel, ?, ?]z = self.pooling(z) # [N, classify_input_channel, ?, ?] -> [N, classify_input_channel, 7, 7]z = z.flatten(1)return self.classify(z)class Vgg16cNet(nn.Module):def __init__(self, num_classes):super(Vgg16cNet, self).__init__()features = nn.Sequential(VggBlock(3, 64, 2),VggBlock(64, 128, 2),VggBlock(128, 256, 3, use_11=True),VggBlock(256, 512, 3, use_11=True),VggBlock(512, 512, 3, use_11=True))self.vgg = VggNet(features=features,num_classes=num_classes,classify_input_channel=512)def forward(self, images):return self.vgg(images)class Vgg16Net(nn.Module):def __init__(self, num_classes):super(Vgg16Net, self).__init__()features = nn.Sequential(VggBlock(3, 64, 2),VggBlock(64, 128, 2),VggBlock(128, 256, 3),VggBlock(256, 512, 3),VggBlock(512, 512, 3))self.vgg = VggNet(features=features,num_classes=num_classes,classify_input_channel=512)def forward(self, images):return self.vgg(images)class Vgg19Net(nn.Module):def __init__(self, num_classes):super(Vgg19Net, self).__init__()features = nn.Sequential(VggBlock(3, 64, 2),VggBlock(64, 128, 2),VggBlock(128, 256, 4),VggBlock(256, 512, 4),VggBlock(512, 512, 4))self.vgg = VggNet(features=features,num_classes=num_classes,classify_input_channel=512)def forward(self, images):return self.vgg(images)class VggLabelNet(nn.Module):def __init__(self, vgg):super(VggLabelNet, self).__init__()self.vgg = vggself.id2name = {0: 'dog',1: 'cat',2: 'cow',3: 'sheep'}def forward(self, images):scores = self.vgg(images) #[N,C,H,W] -> [N,num_classes]pred_index = torch.argmax(scores, dim=1).detach().numpy() #[N,num_classes] -> [N]result =[]for idx in pred_index:result.append(self.id2name[idx])return pred_indexif __name__ == '__main__':vgg16 = Vgg16cNet(num_classes=4)vgg_label = VggLabelNet(vgg16)print(vgg_label)r = vgg_label(torch.rand(4, 3, 224, 224))print(r)
from pathlib import Path
from typing import Union, Listimport torch
from torchvision import models, transforms
from PIL import Image
import torch.nn as nn
import torchvisionclass VggHook(object):def __init__(self, vgg, indexes: Union[int, List[int]] = 44):if isinstance(indexes, int):indexes = list(range(indexes))self.images = {}self.hooks = []for idx in indexes:# 注册一个钩子self.hooks.append(vgg.features[idx].register_forward_hook(self._bulid_hook(idx)))def _bulid_hook(self, idx):def hook(module, module_input, module_output):self.images[idx] = module_output.cpu()  # 将当 前模块的出保存到当前return hookdef remove(self):for hook in self.hooks:hook.remove()if __name__ == '__main__':vgg = models.vgg16_bn(pretrained=True) #从网络上下在vgg16的模型参数vgg_hooks = VggHook(vgg)vgg.eval().cpu()print(vgg)tfs = transforms.ToTensor()resize = transforms.Resize(size=(50, 60))image_path = {'小狗': r'../datas/小狗.png','小狗2': r'../datas/小狗2.png','小猫': r'../datas/小猫.jpg','飞机': r'../datas/飞机.jpg','飞机2': r'../datas/飞机2.jpg'}# img = Image.open(image_path['飞机']).convert("RGB")# img = tfs(img)# print(type(img))# print(img.shape)# img = img[None] # [3, H, W] -> [1, 3, H, W]# for i in range(1):#     score = vgg(img)#     print(score.shape)#     pred_indexes = torch.argmax(score, dim=1)#     print(pred_indexes)#     prob = torch.softmax(score, dim=1)#     top5 = torch.topk(prob, 5, dim=1)#     print(top5)#     print(top5.indices)output_dir = Path('./output/vgg/features/')for name in image_path.keys():img = Image.open(image_path[name]).convert("RGB")img = tfs(img) #[3, H, W]img = img[None]  # [3, H, W] -> [1, 3, H, W]score = vgg(img) # [1, 1000]prob = torch.softmax(score, dim=1)top5 = torch.topk(prob, 5, dim=1)print(name)print(top5)# 各个阶段的可视化输出_output_dir = output_dir /name_output_dir.mkdir(parents=True, exist_ok=True)for layer_idx in vgg_hooks.images.keys():fertures = vgg_hooks.images[layer_idx] #[1,C,H,W]# [1,C,H,W] -> [C,H,W] ->[C,1,H,W]n, c, h, w = fertures.shapefor i in range(n):imgs = fertures[i: i+1]imgs = torch.permute(imgs, dims=(1, 0, 2, 3))imgs = resize(imgs)torchvision.utils.save_image(imgs,output_dir / name /f'{i}_{layer_idx}.png',nrow=8,padding=5,pad_value=128)vgg_hooks.remove()

GoogLeNet

可视化:
from pathlib import Path
from typing import Union, List, Optionalimport torch
from torchvision import models, transforms
from PIL import Image
import torch.nn as nn
import torchvisionclass GoogLeNetHook(object):def __init__(self, net, names: Optional[List[str]]=None):if names is None:names = ['conv1', 'maxpool1', 'conv2', 'conv3', 'maxpool2', 'inception3a','inception3b', 'maxpool3', 'inception4b', 'inception4c', 'inception4d','inception4e', 'maxpool4', 'inception5a', 'inception5b']self.images = {}self.hooks = []for name in names:if name.startswith('inception'):inception = getattr(net, name)branch1 = inception.branch1.register_forward_hook(self._bulid_hook(f"{name}.branch1"))branch2 = inception.branch1.register_forward_hook(self._bulid_hook(f"{name}.branch2"))branch3 = inception.branch1.register_forward_hook(self._bulid_hook(f"{name}.branch3"))branch4 = inception.branch1.register_forward_hook(self._bulid_hook(f"{name}.branch4"))self.hooks.extend([branch1, branch2, branch3, branch4])else:hook = getattr(net, name).register_forward_hook(self._bulid_hook(name))self.hooks.append(hook)def _bulid_hook(self, idx):def hook(module, module_input, module_output):self.images[idx] = module_output.cpu()  # 将当 前模块的出保存到当前return hookdef remove(self):for hook in self.hooks:hook.remove()if __name__ == '__main__':model = models.googlenet(pretrained=True) #从网络上下在vgg16的模型参数model.eval().cpu()hooks = GoogLeNetHook(model)print(model)tfs = transforms.ToTensor()resize = transforms.Resize(size=(50, 60))image_path = {'小狗': r'../datas/小狗.png','小狗2': r'../datas/小狗2.png','小猫': r'../datas/小猫.jpg','飞机': r'../datas/飞机.jpg','飞机2': r'../datas/飞机2.jpg'}output_dir = Path('./output/googlenet/features/')for name in image_path.keys():img = Image.open(image_path[name]).convert("RGB")img = tfs(img) #[3, H, W]img = img[None]  # [3, H, W] -> [1, 3, H, W]score = model(img) # [1, 1000]prob = torch.softmax(score, dim=1)top5 = torch.topk(prob, 5, dim=1)print("=" * 100)print(name)print(top5)# 各个阶段的可视化输出_output_dir = output_dir /name_output_dir.mkdir(parents=True, exist_ok=True)for layer_name in hooks.images.keys():fertures = hooks.images[layer_name] #[1,C,H,W]# [1,C,H,W] -> [C,H,W] ->[C,1,H,W]n, c, h, w = fertures.shapefor i in range(n):imgs = fertures[i: i+1]imgs = torch.permute(imgs, dims=(1, 0, 2, 3))imgs = resize(imgs)torchvision.utils.save_image(imgs,output_dir / name /f'{i}_{layer_name}.png',nrow=8,padding=5,pad_value=128)hooks.remove()

自己实现:

在这里插入图片描述

在这里插入图片描述

Inception架构的主要思想是找出如何让已有的稠密组件接近与覆盖卷积视觉网络中的最佳局部稀疏结构。

为了避免patch校准问题,现在的滤波器大小限制在1x1,3x3和5x5,主要是为了方便,不是必要的。

另外,在pooling层添加一个额外的并行pooling路径用于提高效率。
在这里插入图片描述

架构的第二个主要思想:在计算要求增加很多的地方应用维度缩减和预测。即,在3x3和5x5的卷积前用一个1x1的卷积用于减少计算,还用于修正线性激活。

在这里插入图片描述

Network-in-Network主要思想是,用全连接的多层感知机去代替传统的卷积过程,以获取特征更加全面的表达,同时,因为前面已经做了提升特征表达的过程,传统CNN最后的全连接层也被替换为一个全局平均池化层,因为作者认为此时的map已经具备分类足够的可信度了,它可以直接通过softmax来计算loss了。

GoogLeNet借鉴了NIN的特性,在原先的卷积过程中附加了1*1的卷积核加上ReLU激活。

这不仅仅提升了网络的深度,提高了representation power,而且文中还通过1*1的卷积来进行降维,减少了更新参数量。

import torch
import torch.nn as nnclass GolbalAvgPool2d(nn.Module):def __init__(self):super(GolbalAvgPool2d, self).__init__()def forward(self, x):"""[N, C, H, W]-> [N, C, 1, 1]"""return torch.mean(x, dim=(2, 3), keepdim=True)class BasicConv2d(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding):super(BasicConv2d, self).__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)self.relu = nn.ReLU()def forward(self, x):return self.relu(self.conv(x))class Inception(nn.Module):def __init__(self, in_channels, out_channels, ):"""in_channels:输入通道数 eg:192out_channels:各个分支的输出通道数, eg:[[64], [96, 128], [16,32], [32]]"""super(Inception, self).__init__()self.branch1 = nn.Sequential(BasicConv2d(in_channels, out_channels[0][0], kernel_size=1, stride=1, padding=0))self.branch2 = nn.Sequential(BasicConv2d(in_channels, out_channels[1][0], kernel_size=1, stride=1, padding=0),BasicConv2d(out_channels[1][0], out_channels[1][1], kernel_size=3, stride=1, padding=1))self.branch3 = nn.Sequential(BasicConv2d(in_channels, out_channels[2][0], kernel_size=1, stride=1, padding=0),BasicConv2d(out_channels[2][0], out_channels[2][1], kernel_size=5, stride=1, padding=2))self.branch4 = nn.Sequential(nn.MaxPool2d(3, 1, padding=1),BasicConv2d(in_channels, out_channels[3][0], kernel_size=1, stride=1, padding=0))def forward(self, x):x1 = self.branch1(x)    # [N, C, H, W] -> [N, C1, H, W]x2 = self.branch2(x)    # [N, C, H, W] -> [N, C2, H, W]x3 = self.branch3(x)    # [N, C, H, W] -> [N, C3, H, W]x4 = self.branch4(x)    # [N, C, H, W] -> [N, C4, H, W]x = torch.concat([x1, x2, x3, x4], dim=1)   # [N, C1+C2+C3+C4, H, W]return xclass GoogLeNet(nn.Module):def __init__(self, num_class, add_aux_stage=False):super(GoogLeNet, self).__init__()self.stage1 = nn.Sequential(BasicConv2d(3, 64, 7, 2, 3),nn.MaxPool2d(3, 2, padding=1),# nn.LocalResponseNorm(size=10),BasicConv2d(64, 64, 1, 1, 0),BasicConv2d(64, 192, 3, 1, 1),nn.MaxPool2d(3, 2, padding=1),Inception(192, [[64], [96, 128], [16, 32], [32]]), #inception3aInception(256, [[128], [128, 192], [32, 96], [64]]), #inception3bnn.MaxPool2d(3, 2, padding=1),Inception(480, [[192], [96, 208], [16, 48], [64]]) #inception4a)self.stage2 = nn.Sequential(Inception(512, [[160], [112, 224], [24, 64], [64]]),  # inception4bInception(512, [[128], [128, 256], [24, 64], [64]]),  # inception4cInception(512, [[112], [144, 288], [32, 64], [64]]),  # inception4d)self.stage3 = nn.Sequential(Inception(528, [[256], [160, 320], [32, 128], [128]]),  # inception4enn.MaxPool2d(3, 2, padding=1),Inception(832, [[256], [160, 320], [32, 128], [128]]),  # inception5aInception(832, [[384], [192, 384], [48, 128], [128]]),  # inception5bGolbalAvgPool2d())self.classify = nn.Conv2d(1024, num_class, kernel_size=(1, 1), stride=(1, 1), padding=0)if add_aux_stage:self.aux_stage1 = nn.Sequential(nn.MaxPool2d(5, 3, padding=0),nn.Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), padding=0),nn.ReLU(),nn.AdaptiveAvgPool2d(output_size=(2, 2)),nn.Flatten(1),nn.Linear(4096, 2048),nn.Dropout(p=0.4),nn.ReLU(),nn.Linear(2048, num_class))self.aux_stage2 = nn.Sequential(nn.MaxPool2d(5, 3, padding=0),nn.Conv2d(528, 1024, kernel_size=(1, 1), stride=(1, 1), padding=0),nn.ReLU(),nn.AdaptiveAvgPool2d(output_size=(2, 2)),nn.Flatten(1),nn.Linear(4096, 2048),nn.Dropout(p=0.4),nn.ReLU(),nn.Linear(2048, num_class))else:self.aux_stage1 = Noneself.aux_stage2 = Nonedef forward(self, x):"""[N, C, H, W]"""z1 = self.stage1(x)     # [N, C, H, W] -> [N, 512, H1, W1]z2 = self.stage2(z1)    # [N, 512, H1, W1] ->  [N, 528, H2, W2]z3 = self.stage3(z2)    # [N, 528, H2, W2] ->  [N, 1024, 1, 1]# 三个决策分支输出scores3 = torch.squeeze(self.classify(z3))  # [N, 1024, 1, 1] -> [N, num_class, 1, 1] ->[N, num_class]if self.aux_stage1 is not None:score1 = self.aux_stage1(z1)score2 = self.aux_stage2(z2)return score1, score2, scores3else:return scores3def t1():net = GoogLeNet(num_class=4, add_aux_stage=True)loss_fn = nn.CrossEntropyLoss()_x = torch.rand(2, 3, 224, 224)_y = torch.tensor([0, 3], dtype=torch.long)  # 模拟的真是类别标签id_r1, _r2, _r3 = net(_x)  # 获取三个分支的预测值,可以用来和实际标签一起构架损失函数_loss1 = loss_fn(_r1, _y)_loss2 = loss_fn(_r2, _y)_loss3 = loss_fn(_r3, _y)_loss = _loss1 + _loss2, _loss3print(_r1)print(_r2)print(_r3)print(_r3.shape)traceed_script_module = torch.jit.trace(net.eval(), _x)traceed_script_module.save('./output/modules/googlenet.pt')# 模型持久化torch.save(net, './output/modules/googlenet.pkl')def t2():net1 = torch.load('./output/modules/googlenet.pkl')net2 = GoogLeNet(num_class=4, add_aux_stage=False)# net2 中有部分参数没有恢复# net2 中没有这部分参数,但是入参的字典中传入该参数missing_keeys, unexpected_keys = net2.load_state_dict(net1.state_dict(), strict=False)if len(missing_keeys) >0 :raise ValueError(f"网络有部分参数没有恢复:{missing_keeys}")_x = torch.rand(2, 3, 224, 224)traceed_script_module = torch.jit.trace(net2.eval(), _x)traceed_script_module.save('./output/modules/googlenet.pt')# 转换为onnx结构torch.onnx.export(model=net2.eval().cpu(),  # 给定模型对象args=_x,  # 给定模型forward的输出参数f= './output/modules/googlenet_dynamic.onnx',  # 输出文件名称# training=TrainingMode.EVAL,do_constant_folding=True,input_names=['images'],  # 输入的tensor名称列表output_names=['scores'],  # 输出的tensor名称列表opset_version=12,# dynamic_axes=None   # 是否是动态结构dynamic_axes={'images': {0: 'n',2: 'h',3: 'w'},'label': {0: 'n'}})if __name__ == '__main__':# inception = Inception(192, [[64], [96, 128], [16, 32], [32]])# print(inception)# _x = torch.rand(4, 192, 100, 100)# _r = inception(_x)# print(_r.shape)t1()t2()

ResNet

使用了一种连接方式叫做“shortcut connection” ,顾名思义,shortcut就是“抄近道”的意思。

在这里插入图片描述

在这里插入图片描述

实线的的Connection部分(“第一个粉色矩形和第三个粉色矩形”)都是3x3x64的特征图,他们的channel个数一致,所以采用计算方式:y=F(x)+x

虚线的的Connection部分(”第一个绿色矩形和第三个绿色矩形“)分别是3x3x64和3x3x128的特征图,他们的channel个数不同(64和128),所以采用计算方式:y=F(x)+Wx

其中W是卷积操作,用来调整x的channel维度的。

可视化
from pathlib import Path
from typing import Optional, Listimport torch
import torch.nn as nn
import torchvision.models as models
from torchvision import transforms
import torchvision
from PIL import Imageclass RestNet(object):def __init__(self, net, names: Optional[List[str]]=None):if names is None:names = ['conv1', 'bn1', 'relu', 'maxpool', 'layer1','layer2', 'layer3', 'layer4', 'avgpool']self.images = {}self.hooks = []for name in names:hook = getattr(net, name).register_forward_hook(self._bulid_hook(name))self.hooks.append(hook)def _bulid_hook(self, idx):def hook(module, module_input, module_output):self.images[idx] = module_output.cpu()  # 将当 前模块的出保存到当前return hookdef reset_images(self):self.images = {}def remove(self):for hook in self.hooks:hook.remove()if __name__ == '__main__':model = models.resnet18(pretrained=True)model.eval().cpu()hooks = RestNet(model)print(model)tfs = transforms.ToTensor()resize = transforms.Resize(size=(50, 60))image_path = {'小狗': r'../datas/小狗.png','小狗2': r'../datas/小狗2.png','小猫': r'../datas/小猫.jpg','飞机': r'../datas/飞机.jpg','飞机2': r'../datas/飞机2.jpg'}output_dir = Path('./output/resnet18/features/')for name in image_path.keys():img = Image.open(image_path[name]).convert("RGB")img = tfs(img)  # [3, H, W]img = img[None]  # [3, H, W] -> [1, 3, H, W]score = model(img)  # [1, 1000]prob = torch.softmax(score, dim=1)top5 = torch.topk(prob, 5, dim=1)print("=" * 100)print(name)print(top5)# 各个阶段的可视化输出_output_dir = output_dir / name_output_dir.mkdir(parents=True, exist_ok=True)for layer_name in hooks.images.keys():fertures = hooks.images[layer_name]  # [1,C,H,W]# [1,C,H,W] -> [C,H,W] ->[C,1,H,W]n, c, h, w = fertures.shapefor i in range(n):imgs = fertures[i: i + 1]imgs = torch.Tensor.permute(imgs, dims=(1, 0, 2, 3))imgs = resize(imgs)torchvision.utils.save_image(imgs,output_dir / name / f'{i}_{layer_name}.png',nrow=8,padding=5,pad_value=128)hooks.reset_images()hooks.remove()

DenseNet

DenseNet(Dense Convolutional Network)是一种具有密集连接的卷积神经网络,在这个网络结构中任意两层之间均存在直接连接,也就是说每一层的输入都是前面所有层输出的并集,而该层所学习的特征图也会被直接传给其后面所有层作为输入

NOTE:DenseNet中的dense connectivity仅存在一个dense block中,不同dense block块之间是没有dense connectivity的。

密集连接的优点,缓解梯度消失问题,加强特征传播,增加特征复用,极大的减少参数量。

DenseNet中的dense block类似ResNet中的block结构,也即是:BN-ReLU-Conv(1x1)->BN-ReLU- Conv(3x3),并且DenseNet中的dense block具有多个这样的block结构。

每个dense block之间层称为transition layer,由BN-Conv(1x1)- AveragePooling(2x2)组成。

import re
from collections import OrderedDict
from functools import partial
from typing import Any, List, Optional, Tupleimport torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.checkpoint as cp
from torch import Tensor
from torchvision.models._api import register_model, Weights, WeightsEnum
from torchvision.models._utils import _ovewrite_named_param, handle_legacy_interfacefrom torchvision.transforms._presets import ImageClassification
from torchvision.utils import _log_api_usage_once
from torchvision.models._meta import _IMAGENET_CATEGORIESclass BasicConv2d(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding):super(BasicConv2d, self).__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)self.relu = nn.ReLU()def forward(self, x):return self.relu(self.conv(x))class Inception(nn.Module):def __init__(self, in_channels, out_channels, ):"""in_channels:输入通道数 eg:192out_channels:各个分支的输出通道数, eg:[[64], [96, 128], [16,32], [32]]"""super(Inception, self).__init__()self.branch1 = nn.Sequential(BasicConv2d(in_channels, out_channels[0][0], kernel_size=1, stride=1, padding=0))self.branch2 = nn.Sequential(BasicConv2d(in_channels, out_channels[1][0], kernel_size=1, stride=1, padding=0),BasicConv2d(out_channels[1][0], out_channels[1][1], kernel_size=3, stride=1, padding=1))self.branch3 = nn.Sequential(BasicConv2d(in_channels, out_channels[2][0], kernel_size=1, stride=1, padding=0),BasicConv2d(out_channels[2][0], out_channels[2][1], kernel_size=5, stride=1, padding=2))self.branch4 = nn.Sequential(nn.MaxPool2d(3, 1, padding=1),BasicConv2d(in_channels, out_channels[3][0], kernel_size=1, stride=1, padding=0))def forward(self, x):x1 = self.branch1(x)    # [N, C, H, W] -> [N, C1, H, W]x2 = self.branch2(x)    # [N, C, H, W] -> [N, C2, H, W]x3 = self.branch3(x)    # [N, C, H, W] -> [N, C3, H, W]x4 = self.branch4(x)    # [N, C, H, W] -> [N, C4, H, W]x = torch.concat([x1, x2, x3, x4], dim=1)   # [N, C1+C2+C3+C4, H, W]return xclass _DenseLayer(nn.Module):def __init__(self, num_input_features: int, growth_rate: int, bn_size: int, drop_rate: float, memory_efficient: bool = False) -> None:super().__init__()conv_growth_rate = int(0.25 * growth_rate)out_channels = [[conv_growth_rate],[bn_size * conv_growth_rate, conv_growth_rate],[bn_size * conv_growth_rate, conv_growth_rate],[growth_rate - 3 * conv_growth_rate]]self.model = Inception(in_channels=num_input_features, out_channels=out_channels)# torchscript does not yet support *args, so we overload method# allowing it to take either a List[Tensor] or single Tensordef forward(self, input: Tensor) -> Tensor:  # noqa: F811if isinstance(input, Tensor):prev_features = inputelse:prev_features = torch.concat(input, dim=1)new_features = self.model(prev_features)return new_featuresclass _DenseBlock(nn.ModuleDict):_version = 2def __init__(self,num_layers: int,num_input_features: int,bn_size: int,growth_rate: int,drop_rate: float,memory_efficient: bool = False,) -> None:super().__init__()for i in range(num_layers):layer = _DenseLayer(num_input_features + i * growth_rate,growth_rate=growth_rate,bn_size=bn_size,drop_rate=drop_rate,memory_efficient=memory_efficient,)self.add_module("denselayer%d" % (i + 1), layer)def forward(self, init_features: Tensor) -> Tensor:features = [init_features]for name, layer in self.items():new_features = layer(features)features.append(new_features)return torch.cat(features, 1)class _Transition(nn.Sequential):def __init__(self, num_input_features: int, num_output_features: int) -> None:super().__init__()self.norm = nn.BatchNorm2d(num_input_features)self.relu = nn.ReLU(inplace=True)self.conv = nn.Conv2d(num_input_features, num_output_features, kernel_size=1, stride=1, bias=False)self.pool = nn.AvgPool2d(kernel_size=2, stride=2)class DenseNet(nn.Module):def __init__(self,growth_rate: int = 32,block_config: Tuple[int, int, int, int] = (6, 12, 24, 16),num_init_features: int = 64,bn_size: int = 4,drop_rate: float = 0,num_classes: int = 1000,memory_efficient: bool = False,) -> None:super().__init__()_log_api_usage_once(self)# First convolutionself.features = nn.Sequential(OrderedDict([("conv0", nn.Conv2d(3, num_init_features, kernel_size=7, stride=2, padding=3, bias=False)),("norm0", nn.BatchNorm2d(num_init_features)),("relu0", nn.ReLU(inplace=True)),("pool0", nn.MaxPool2d(kernel_size=3, stride=2, padding=1)),]))# Each denseblocknum_features = num_init_featuresfor i, num_layers in enumerate(block_config):block = _DenseBlock(num_layers=num_layers,num_input_features=num_features,bn_size=bn_size,growth_rate=growth_rate,drop_rate=drop_rate,memory_efficient=memory_efficient,)self.features.add_module("denseblock%d" % (i + 1), block)num_features = num_features + num_layers * growth_rateif i != len(block_config) - 1:trans = _Transition(num_input_features=num_features, num_output_features=num_features // 2)self.features.add_module("transition%d" % (i + 1), trans)num_features = num_features // 2# Final batch normself.features.add_module("norm5", nn.BatchNorm2d(num_features))# Linear layerself.classifier = nn.Linear(num_features, num_classes)# Official init from torch repo.for m in self.modules():if isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight)elif isinstance(m, nn.BatchNorm2d):nn.init.constant_(m.weight, 1)nn.init.constant_(m.bias, 0)elif isinstance(m, nn.Linear):nn.init.constant_(m.bias, 0)def forward(self, x: Tensor) -> Tensor:features = self.features(x)out = F.relu(features, inplace=True)out = F.adaptive_avg_pool2d(out, (1, 1))out = torch.flatten(out, 1)out = self.classifier(out)return outdef _densenet(growth_rate: int,block_config: Tuple[int, int, int, int],num_init_features: int,weights: Optional[WeightsEnum],progress: bool,**kwargs: Any,
) -> DenseNet:if weights is not None:_ovewrite_named_param(kwargs, "num_classes", len(weights.meta["categories"]))model = DenseNet(growth_rate, block_config, num_init_features, **kwargs)return model_COMMON_META = {"min_size": (29, 29),"categories": _IMAGENET_CATEGORIES,"recipe": "https://github.com/pytorch/vision/pull/116","_docs": """These weights are ported from LuaTorch.""",
}class DenseNet121_Weights(WeightsEnum):IMAGENET1K_V1 = Weights(url="https://download.pytorch.org/models/densenet121-a639ec97.pth",transforms=partial(ImageClassification, crop_size=224),meta={**_COMMON_META,"num_params": 7978856,"_metrics": {"ImageNet-1K": {"acc@1": 74.434,"acc@5": 91.972,}},"_ops": 2.834,"_file_size": 30.845,},)DEFAULT = IMAGENET1K_V1class DenseNet161_Weights(WeightsEnum):IMAGENET1K_V1 = Weights(url="https://download.pytorch.org/models/densenet161-8d451a50.pth",transforms=partial(ImageClassification, crop_size=224),meta={**_COMMON_META,"num_params": 28681000,"_metrics": {"ImageNet-1K": {"acc@1": 77.138,"acc@5": 93.560,}},"_ops": 7.728,"_file_size": 110.369,},)DEFAULT = IMAGENET1K_V1class DenseNet169_Weights(WeightsEnum):IMAGENET1K_V1 = Weights(url="https://download.pytorch.org/models/densenet169-b2777c0a.pth",transforms=partial(ImageClassification, crop_size=224),meta={**_COMMON_META,"num_params": 14149480,"_metrics": {"ImageNet-1K": {"acc@1": 75.600,"acc@5": 92.806,}},"_ops": 3.36,"_file_size": 54.708,},)DEFAULT = IMAGENET1K_V1class DenseNet201_Weights(WeightsEnum):IMAGENET1K_V1 = Weights(url="https://download.pytorch.org/models/densenet201-c1103571.pth",transforms=partial(ImageClassification, crop_size=224),meta={**_COMMON_META,"num_params": 20013928,"_metrics": {"ImageNet-1K": {"acc@1": 76.896,"acc@5": 93.370,}},"_ops": 4.291,"_file_size": 77.373,},)DEFAULT = IMAGENET1K_V1@register_model()
@handle_legacy_interface(weights=("pretrained", DenseNet121_Weights.IMAGENET1K_V1))
def my_densenet121(*, weights: Optional[DenseNet121_Weights] = None, progress: bool = True, **kwargs: Any) -> DenseNet:weights = DenseNet121_Weights.verify(weights)return _densenet(32, (6, 12, 24, 16), 64, weights, progress, **kwargs)if __name__ == '__main__':net = my_densenet121()print(net)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/50515.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Unity UGUI 之 自动布局组件

本文仅作学习笔记与交流,不作任何商业用途 本文包括但不限于unity官方手册,唐老狮,麦扣教程知识,引用会标记,如有不足还请斧正 本文在发布时间选用unity 2022.3.8稳定版本,请注意分别 1.什么是自动布局组件…

【Node.js入门精要】从零开始的开发之旅

说明文档:Node.js 教程_w3cschool 概念 Node.js 是一个开源、跨平台的 JavaScript 运行时环境,基于 Chrome 的 V8 引擎构建,专为构建高性能和可扩展的网络应用程序而设计的服务端语言。它采用事件驱动、非阻塞 I/O 模型,能够处理大…

GB28181摄像头管理平台WVP视频平台SQL注入漏洞复现 [附POC]

文章目录 GB28181摄像头管理平台WVP视频平台SQL注入漏洞复现 [附POC]0x01 前言0x02 漏洞描述0x03 影响版本0x04 漏洞环境0x05 漏洞复现1.访问漏洞环境2.构造POC3.复现 GB28181摄像头管理平台WVP视频平台SQL注入漏洞复现 [附POC] 0x01 前言 免责声明:请勿利用文章内…

Unity UGUI 之 Mask

本文仅作学习笔记与交流,不作任何商业用途 本文包括但不限于unity官方手册,唐老狮,麦扣教程知识,引用会标记,如有不足还请斧正 本文在发布时间选用unity 2022.3.8稳定版本,请注意分别 1.什么是遮罩 遮罩是一…

运算符 、、|、|| 、短路符【|| 、】<< 、>>

》》》&&是逻辑与运算符,|| 是逻辑或运算符 !是逻辑非运算符 逻辑与运算符:全为真(1)即结果为真(1),一个为假即全为假(0) 逻辑或运算符:…

shell循环语句

一, 循环语句 1.for循环语句 读取不同的变量值,用来逐个执行同一组命令 格式 for 变量名 in 取值列表 do 命令序列 done 1.1 列表循环 [rootlocalhost /home]# vim demo32.sh #!/bin/bash for i in {a..c} doecho $i done ~ [rootlocalhost /ho…

数据结构·AVL树

1. AVL树的概念 二叉搜索树虽可以缩短查找的效率,但如果存数据时接近有序,二叉搜索将退化为单支树,此时查找元素效率相当于在顺序表中查找,效率低下。因此两位俄罗斯数学家 G.M.Adelson-Velskii 和E.M.Landis 在1962年发明了一种解…

Pytorch深度学习实践(5)逻辑回归

逻辑回归 逻辑回归主要是解决分类问题 回归任务:结果是一个连续的实数分类任务:结果是一个离散的值 分类任务不能直接使用回归去预测,比如在手写识别中(识别手写 0 − − 9 0 -- 9 0−−9),因为各个类别…

了解Java虚拟机(JVM)

前言👀~ 上一章我们介绍网络原理相关的知识点,今天我们浅浅来了解一下java虚拟机JVM JVM( Java Virtual Machine ) JVM内存区域划分 方法区/元数据区(线程共享) 堆(线程共享) 虚…

iOS object-C 解答算法:找到所有数组中消失的数字(leetCode-448)

找到所有数组中消失的数字(leetCode-448) 题目如下图:(也可以到leetCode上看完整题目,题号448) 光看题看可能有点难以理解,我们结合示例1来理解一下这道题. 有8个整数的数组 nums [4,3,2,7,8,2,3,1], 求在闭区间[1,8]范围内(即1,2,3,4,5,6,7,8)的数字,哪几个没有出现在数组 …

Spring Boot的Web开发

目录 Spring Boot的Web开发 1.静态资源映射规则 第一种静态资源映射规则 2.enjoy模板引擎 3.springMVC 3.1请求处理 RequestMapping DeleteMapping 删除 PutMapping 修改 GetMapping 查询 PostMapping 新增 3.2参数绑定 一.支持数据类型: 3.3常用注解 一.Request…

网闸(Network Gatekeeper或Security Gateway)

本心、输入输出、结果 文章目录 网闸(Network Gatekeeper或Security Gateway)前言网闸主要功能网闸工作原理网闸使用场景网闸网闸(Network Gatekeeper或Security Gateway) 编辑 | 简简单单 Online zuozuo 地址 | https://blog.csdn.net/qq_15071263 如果觉得本文对你有帮助…

c++如何理解多态与虚函数

目录 **前言****1. 何为多态**1.1 **编译时多态**1.1.1 函数重载1.1.2 模板 **1.2 运行时多态****1.2.1 虚函数****1.2.2 为什么要用父类指针去调用子类函数** **2. 注意****2.1 基类的析构函数应写为虚函数****2.2 构造函数不能设为虚函数** **本文参考** 前言 在学习 c 的虚…

C++ | Leetcode C++题解之第275题H指数II

题目&#xff1a; 题解&#xff1a; class Solution { public:int hIndex(vector<int>& citations) {int n citations.size();int left 0, right n - 1;while (left < right) {int mid left (right - left) / 2;if (citations[mid] > n - mid) {right m…

全球“微软蓝屏”事件:IT基础设施韧性与安全性的考验

人不走空 &#x1f308;个人主页&#xff1a;人不走空 &#x1f496;系列专栏&#xff1a;算法专题 ⏰诗词歌赋&#xff1a;斯是陋室&#xff0c;惟吾德馨 目录 &#x1f308;个人主页&#xff1a;人不走空 &#x1f496;系列专栏&#xff1a;算法专题 ⏰诗词歌…

git配置环境变量

一.找到git安装目录 打开此git安装目录下的bin文件&#xff0c;复制此文件路径 二.配置环境变量 2.1 右键点击此电脑的属性栏 2.2 点击高级系统配置 2.3 点击环境变量 2.4 按图中步骤进行配置 三.配置完成 win r 输入cmd打开终端 终端页面中输入 git --version 如图所示…

20240725java的Controller、DAO、DO、Mapper、Service层、反射、AOP注解等内容的学习

在Java开发中&#xff0c;‌controller、‌dao、‌do、‌mapper等概念通常与MVC&#xff08;‌Model-View-Controller&#xff09;‌架构和分层设计相关。‌这些概念各自承担着不同的职责&#xff0c;‌共同协作以构建和运行一个应用程序。‌以下是这些概念的解释&#xff1a;‌…

Redis的两种持久化方式---RDB、AOF

rdb其实就是一种快照持久化的方式&#xff0c;它会将Redis在某个时间点的所有的数据状态以二进制的方式保存到硬盘上的文件当中&#xff0c;它相对于aof文件会小很多&#xff0c;因为知识某个时间点的数据&#xff0c;当然&#xff0c;这就会导致它的实时性不够高&#xff0c;如…

【游戏制作】使用Python创建一个美观的贪吃蛇游戏,附完整代码

目录 前言 项目运行结果 项目简介 环境配置 代码实现 主体结构 主要功能详解 界面和菜单 控制蛇的移动 食物生成和碰撞检测 游戏结束 运行游戏 总结 前言 贪吃蛇游戏是一款经典的电脑游戏&#xff0c;许多人都曾经玩过。今天我们将使用Python和ttkbootstrap库来实…

Mysql注意事项(一)

Mysql注意事项&#xff08;一&#xff09; 最近回顾了一下MySQL&#xff0c;发现了一些MySQL需要注意的事项&#xff0c;同时也作为学习笔记&#xff0c;记录下来。–2020年05月13日 1、通配符* 检索所有的列。 不建议使用 通常&#xff0c;除非你确定需要表中的每个列&am…