pytorch卷积可视化

Image for post — Filter and Feature map Image by the author

When dealing with image’s and image data, CNN are the go-to architectures. Convolutional neural networks have proved to provide many state-of-the-art solutions in deep learning and computer vision. Image recognition, object detection, self-driving cars would not be possible without CNN.

w ^ 母鸡交易与图像的图像数据，CNN是去到架构。卷积神经网络已被证明可以提供深度学习和计算机视觉方面的许多最新解决方案。没有CNN，图像识别，物体检测，自动驾驶汽车将无法实现。

But when it comes down to how CNN see’s and recognize the image the way they do, things can be trickier.

但是，当涉及到CNN如何看待并以其方式识别图像时，事情可能会变得更加棘手。

How a CNN decides whether a image is a cat or dog ?
CNN如何确定图片是猫还是狗？
What makes a CNN more powerful than other models when it comes to image classification problems ?
在图像分类问题上，什么使CNN比其他模型更强大？
How and what do they see in an image ?
他们如何以及在图像中看到什么？

These were some of the questions i had back back when i first learned about CNN. The questions will grow as you deep dive into it.

这些是我第一次了解CNN时回想的一些问题。当您深入研究时，问题将会越来越多。

Back then i heard about these terms filters and featuremaps, but dont know what they are and what they do. Later i know what they are but dont know what they look like but now, i know. When dealing with Deep Convolutional Networks filters and featuremaps are important. Filters are what makes the Featuremaps and that’s what the model see’s.

那时我听说过这些术语过滤器和功能图，但不知道它们是什么以及它们做什么。后来我知道它们是什么，但不知道它们是什么样子，但是现在，我知道了。在处理深度卷积网络时，过滤器和功能图很重要。过滤器是构成Featuremap的要素，而这正是模型所看到的。

什么是CNN中的过滤器和FeatureMap？ (What are Filters and FeatureMaps in CNN?)

Filters are set of weights which are learned using the backpropagation algorithm. If you do alot of practical deep learning coding, you may know them as kernels. Filter size can be of 3×3 or maybe 5×5 or maybe even 7×7.

˚Filters设置其使用的是BP算法了解到砝码。如果您进行了大量实用的深度学习编码，则可能将它们称为内核。过滤器尺寸可以是3×3或5×5甚至7×7 。

Filters in a CNN layer learn to detect abstract concepts like boundary of a face, edges of a buildings etc. By stacking more and more CNN layers on top of each other, we can get more abstract and in-depth information from a CNN.

CNN层中的过滤器学习检测抽象概念，例如人脸边界，建筑物边缘等。通过在彼此之上堆叠越来越多的CNN层，我们可以从CNN获得更多抽象和深入的信息。

Feature Maps are the results we get after applying the filter through the pixel value of the image.This is what the model see’s in a image and the process is called convolution operation. The reason for visualising the feature maps is to gain deeper understandings about CNN.

˚Feature地图是结果通过image.This的像素值应用筛选后我们拿到的是什么模型中看到的一个图像中的过程被称为卷积运算 。可视化特征图的原因是为了获得对CNN的更深入了解。

选择型号 (Selecting the model)

We will use the ResNet-50 neural network model for visualizing filters and feature maps. Using a ResNet-50 model for visualizing filters and feature maps is not ideal. The reason is that the resnet models in general, are a bit complex. Traversing through the inner convolutional layers can become quite difficult. You will learn how to access the inner convolutional layers of a difficult architecture. In the future, you will feel much more comfortable working with similar or more complex architectures.

我们将使用ResNet-50神经网络模型来可视化过滤器和特征图。使用ResNet-50模型来可视化过滤器和功能图不是理想的选择。原因是resnet模型通常比较复杂。遍历内部卷积层可能变得非常困难。您将学习如何访问困难体系结构的内部卷积层。将来，您将在使用类似或更复杂的体系结构时感到更加自在。

The image i used is a photo from pexels. Its a image i collected to train my face-detection classifier.

我使用的图像是来自像素像素的照片。我收集来训练我的面部检测分类器的图像。

模型结构 (Model Structure)

At first glance, looking at the model structure can be intimidating, but it is really easy to get what we want. By knowing how to extract the layers of this model, you will be able to extract layers of more complex models. Below is the model structure.

乍一看，看模型结构可能会令人生畏，但真正容易获得我们想要的。通过了解如何提取此模型的图层，您将能够提取更复杂的模型的图层。下面是模型结构。

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
...
(2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=1000, bias=True)

提取CNN层 (Extracting the CNN layers)

First, at line 4, we initialize a counter variable to keep track of the number of convolutional layers.
首先，在第4行 ，我们初始化一个counter变量以跟踪卷积层的数量。
Starting from line 6, we are going through all the layers of the ResNet-50 model.
从第6行开始，我们将遍历ResNet-50模型的所有层。
Specifically, we are checking for convolutional layers at three levels of nesting
具体来说，我们正在检查三个嵌套级别的卷积层
Line 7, checks if any of the direct children of the model is a convolutional layer.
第7行 ，检查模型的任何直接子级是否为卷积层。
Then from line 10, we check whether any of the Bottleneck layer inside the Sequential blocks contain any convolutional layers.
然后从第10行开始 ，检查Sequential块内的任何Bottleneck层是否包含任何卷积层。
If any of the above two conditions satisfy, then we append that child node and the weights to the conv_layers and model_weights respectively,
如果以上两个条件中的任何一个都满足，则我们将该子节点和权重分别附加到conv_layers和model_weights上，

The above code is simple and self-explanatory but it is limited to pre-existing models like other resnet model resnet-18, 34, 101, 152. For a custom model ,things will be different ,lets say there is a Sequential layer inside another Sequential layer and if there is a CNN layer it will be unchecked by the program. This is where the extractor.py module i wrote can be useful.

上面的代码很简单，不言自明，但仅限于像其他resnet模型resnet-18、34、101、152这样的现有模型。对于自定义模型，情况将有所不同，可以说内部有一个顺序层。另一个顺序层，如果有CNN层，程序将不对其进行检查。这是我编写的extractor.py模块有用的地方。

提取器类 (Extractor class)

The Extractor class can find every CNN layer(except down-sample layers) including their weights in any resnet model and almost in any custom resnet and vgg model. Its not limited to CNN layers, it can find Linear layers and if the name of the Down-sampling layer is mentioned, it can find that too. It can also give some useful information like the number of CNN, Linear and Sequential layers in a model.

Extractor类可以在任何resnet模型以及几乎任何自定义的resnet和vgg模型中找到每个CNN层(向下采样层除外)，包括它们的权重。它不限于CNN层，它可以找到线性层，并且如果提到下采样层的名称，它也可以找到。它还可以提供一些有用的信息，例如模型中的CNN，线性层和顺序层的数量。

如何使用 (How to use)

In the Extractor class the model parameter takes in a model and the DS_layer_name parameter is optional. The DS_layer_name parameter is to find the down-sampling layer, normally in resnet layer the name will be ‘downsample’ so it is kept as default.

在Extractor类中，模型参数接受模型，而DS_layer_name参数是可选的。 DS_layer_name参数用于查找下采样层，通常在resnet层中，名称为“ downsample”，因此将其保留为默认值。

extractor = Extractor(model = resnet, DS_layer_name = 'downsample')

The code extractor.activate() is to activate the program.

代码extractor.activate()用于激活程序。

You can get relevant details in a dictionary by calling extractor.info()

您可以通过调用extractor.info()获取字典中的相关详细信息。

{'Down-sample layers name': 'downsample', 'Total CNN Layers': 49, 'Total Sequential Layers': 4, 'Total Downsampling Layers': 4, 'Total Linear Layers': 1, 'Total number of Bottleneck and Basicblock': 16, 'Total Execution time': '0.00137 sec'}

访问权重和图层 (Accessing the weights and the layers)

extractor.CNN_layers -----> Gives all the CNN layers in a model
extractor.Linear_layers --> Gives all the Linear layers in a model
extractor.DS_layers ------> Gives all the Down-sample layers in a model if there are any
extractor.CNN_weights ----> Gives all the CNN layer's weights in a model
extractor.Linear_weights -> Gives all the Linear layer's weights in a model

Without any coding you can get CNN and Linear layers and their weights in almost every resnet model. Below is what the class methods looks like , there is more, do go through the entire script.

无需任何编码，您几乎可以在每个resnet模型中获得CNN和Linear图层及其权重。下面是类方法的样子，还有更多，请仔细阅读整个脚本。

可视化 (Visualizing)

卷积层过滤器 (Convolutional Layer Filters)

Here we will visualize the convolutional layer filters. For simplicity, we will only visualize the filters of the first convolutional layer.

在这里，我们将可视化卷积层过滤器。为了简单起见，我们将仅可视化第一卷积层的过滤器。

We are looping through the model weights of the first layer. For the first layer the filter size is 7×7 and there are 64 channels(hidden layers).

我们正在遍历第一层的模型权重。对于第一层，过滤器大小为7×7，并且有64个通道(隐藏层)。

The pixel values for each small boxes is between 0 to 255. 0 being complete black and 255 being white. The range can be different like between 0 to 1 or -1 to 1 with 0 as the mean.

每个小盒子的像素值在0到255之间。0为全黑，而255为白。范围可以不同，例如0到1或-1到1，平均值为0。

要素图 (The Feature Maps)

Transformin g ^ (Transforming)

To visualize the feature maps, first the image need to be converted to a tensor image. Using the transforms from torchvision the image can be normalized and transformed to a tensor.

为了可视化特征图，首先需要将图像转换为张量图像。使用来自火炬视觉的变换，可以将图像标准化并变换为张量。

The last line after the transforms means applying the transforms to the image. You can create a new variable and then apply it, but make sure to change the variable name. And the .unsqueeze(0) is to add an extra dimension to the tensor img. Adding the batch dimension is an important step. Now the size of the image, instead of being [3, 128, 128], is [1, 3, 128, 128], indicating that there is only one image in the batch.

变换后的最后一行表示将变换应用于图像。您可以创建一个新变量，然后应用它，但请确保更改变量名称。 .unsqueeze(0)用于为张量img添加额外的尺寸。添加批次尺寸是重要的步骤。现在，图像的大小不是[3, 128, 128] ，而是[1, 3, 128, 128] ，指示批次中只有一个图像。

使输入图像通过每个卷积层 (Passing the Input Image Through Each Convolutional Layer)

The below code will pass the image through each convolutional layer.

以下代码将使图像通过每个卷积层。

We will first give the image as an input to the first convolutional layer. After that, we will use a for loop to pass the last layer’s outputs to the next layer, until we reach the last convolutional layer.

我们首先将图像作为第一卷积层的输入。之后，我们将使用for循环将最后一层的输出传递到下一层，直到到达最后一个卷积层。

At line 1, we give the image as input to the first convolutional layer.
在第1行 ，我们将图像作为输入输入到第一卷积层。
Then we iterate from through the second till the last convolutional layer using a for loop.
然后，我们使用for循环从第二个卷积层到最后一个卷积层进行迭代。
We give the last layer’s output as the input to the next convolutional layer (featuremaps[-1]).
我们将最后一层的输出作为下一个卷积层( featuremaps[-1 ])的输入。
Also, we append each layer’s output to the featuremaps list.
另外，我们将每个图层的输出附加到featuremaps列表。

可视化特征图 (Visualizing the Feature Maps)

This is the final step. We will write the code to visualize the feature maps. Notice that the final cnn layer have many feature maps, in the range of 512 to 2048. But we will only visualize 64 feature maps from each layer as any more than that will make the outputs really cluttered.

这是最后一步。我们将编写代码以可视化要素地图。请注意，最后的cnn图层具有许多要素图，范围在512到2048之间。但是，我们将仅可视化每个图层的64个要素图，因为这将使输出真正混乱。

Starting from line 2, we iterate through the featuremaps.
从第2行开始，我们遍历featuremaps 。
Then we get layers as featuremaps[x][0, :, :, :].detach() .
然后，我们将layers作为featuremaps[x][0, :, :, :].detach() 。
Starting from line 5, we iterate through the filters in each layers. We break out of the loop if it is the 64th feature map.
从第5行开始，我们遍历每layers的过滤器。如果它是第64个要素图，我们将跳出循环。
After that we plot the feature map, and save them if necessary.
之后，我们绘制特征图，并在必要时保存它们。

结果 (Results)

You can see that different filters focus on different aspects while creating the feature map of an image.

您可以看到在创建图像的特征图时，不同的滤镜专注于不同的方面。

Some feature maps focus on the background of the image. Some others create an outline of the image. A few filters create feature maps where the background is dark but the image of the face is bright. This is due to the corresponding weights of the filters. It is very clear from the above image that in the deep layers, the neural network gets to see very detailed feature maps of the input image.

一些功能贴图集中在图像的背景上。其他一些则创建图像的轮廓。一些滤镜会创建要素图，其中背景较暗，但脸部图像较亮。这是由于过滤器的相应重量。从上面的图像很清楚，在较深的层中，神经网络可以看到输入图像的非常详细的特征图。

Let’s take a look at a few other feature maps.

让我们看一下其他一些功能图。

You can observe that as the image progresses through the layers the details from the images slowly disappears. They look like noise, but surely there is a pattern in those feature maps which human eyes cannot detect, but a neural network can.

您可以观察到，随着图像逐步穿过图层，图像中的细节逐渐消失。它们看起来像噪声，但可以肯定的是，在这些特征图中，人眼无法检测到某种模式，但是神经网络可以检测到。

By the time the image reaches the last convolutional layer then it is impossible for a human being to tell what that is. These last layer outputs are really important for the fully connected neurons which basically form the classification layers in a convolutional neural network.

到图像到达最后一个卷积层时，人类就不可能知道那是什么。这些最后一层的输出对于完全连接的神经元非常重要，这些神经元基本上形成了卷积神经网络中的分类层。

结论 (Conclusions)

A big thanks to @sovitrath5 author of machine learning blog DebuggerCafe for the content.

非常感谢机器学习博客DebuggerCafe的作者@ sovitrath5提供的内容。

翻译自: https://medium.com/swlh/visualizing-filters-and-feature-maps-in-convolutional-neural-networks-using-pytorch-110d4c1cfdeb