UFLDL教程：Exercise:Convolution and Pooling

Deep Learning and Unsupervised Feature Learning Tutorial Solutions

CNN的基本结构包括两层
其一为特征提取层，每个神经元的输入与前一层的局部接受域相连，并提取该局部的特征。一旦该局部特征被提取后，它与其它特征间的位置关系也随之确定下来。
其二是特征映射层，网络的每个计算层由多个特征映射组成，每个特征映射是一个平面，平面上所有神经元的权值相等。

由于一个映射面上的神经元共享权值，因而减少了网络自由参数的个数。
由于同一特征映射面上的神经元权值相同，所以网络可以并行学习，这也是卷积网络相对于神经元彼此相连网络的一大优势.
权值共享降低了网络的复杂性，特别是多维输入向量的图像可以直接输入网络这一特点避免了特征提取和分类过程中数据重建的复杂度。

卷积神经网络中的每一个卷积层都紧跟着一个用来求局部平均与二次提取的计算层，这种特有的两次特征提取结构减小了特征分辨率。

卷积神经元每一个隐层的单元提取图像局部特征，将其映射成一个平面，特征映射函数采用 sigmoid 函数作为卷积网络的激活函数，使得特征映射具有位移不变性。每个神经元与前一层的局部感受野相连。

不是局部连接的神经元权值相同，而是同一平面层的神经元权值相同，有相同程度的位移、旋转不变性。每个特征提取后都紧跟着一个用来求局部平均与二次提取的亚取样层。这种特有的两次特征提取结构使得网络对输入样本有较高的畸变容忍能力。也就是说，卷积神经网络通过局部感受野、共享权值和亚取样来保证图像对位移、缩放、扭曲的鲁棒性。

卷积神经网络有两种神器可以降低参数数目，第一种神器叫做局部感知野。第二种神器，即权值共享。

局部感受野

一般认为人对外界的认知是从局部到全局的，而图像的空间联系也是局部的像素联系较为紧密，而距离较远的像素相关性则较弱。因而，每个神经元其实没有必要对全局图像进行感知，只需要对局部进行感知，然后在更高层将局部的信息综合起来就得到了全局的信息。网络部分连通的思想，也是受启发于生物学里面的视觉系统结构。视觉皮层的神经元就是局部接受信息的（即这些神经元只响应某些特定区域的刺激）。

这里写图片描述

权值共享

怎么理解权值共享呢？我们可以这100个参数（也就是卷积操作）看成是提取特征的方式，该方式与位置无关。这其中隐含的原理则是：图像的一部分的统计特性与其他部分是一样的。这也意味着我们在这一部分学习的特征也能用在另一部分上，所以对于这个图像上的所有位置，我们都能使用同样的学习特征。

权值共享，不是所有的红色线标注的连接权值相同。而是每一个颜色的线都有一个红色线的权值与之相等.

卷积

全联通网络

较大图像，通过全联通网络来学习整幅图像的特征，将会非常耗时。

在稀疏自编码章节中，我们把输入层和隐含层进行“全连接”的设计。
从计算的角度来讲，在其他章节中曾经用过的相对较小的图像（如在稀疏自编码的作业中用到过的 8x8 的小块图像，在MNIST数据集中用到过的28x28 的小块图像），从整幅图像中计算特征是可行的。
但是，如果是更大的图像（如 96x96 的图像），要通过这种全联通网络的这种方法来学习整幅图像上的特征，从计算角度而言，将变得非常耗时。
比如说为96*96，隐含层有要学习100个特征，则这时候把输入层的所有点都与隐含层节点连接，则大约需要学习10^6个参数，这样的话在使用BP算法时速度就明显慢了很多。

部分联通网络

解决这类问题的一种简单方法是对隐含单元和输入单元间的连接加以限制：每个隐含单元仅仅只能连接输入单元的一部分。
例如，每个隐含单元仅仅连接输入图像的一小片相邻区域。（对于不同于图像的输入形式，也会有一些特别的连接到单隐含层的输入信号“连接区域”选择方式。如音频作为一种信号输入方式，一个隐含单元所需要连接的输入单元的子集，可能仅仅是一段音频输入所对应的某个时间段上的信号。）

网络部分连通的思想，也是受启发于生物学里面的视觉系统结构。视觉皮层的神经元就是局部接受信息的（即这些神经元只响应某些特定区域的刺激）。

自然图像有其固有特性，它们具有稳定性，也就是说，图像的一部分的统计特性与其他部分是一样的。这也意味着我们在这一部分学习的特征也能用在另一部分上，所以对于这个图像上的所有位置，我们都能使用同样的学习特征。
更恰当的解释是，当从一个大尺寸图像中随机选取一小块，比如说 8x8 作为样本，并且从这个小块样本中学习到了一些特征，这时我们可以把从这个 8x8 样本中学习到的特征作为探测器，应用到这个图像的任意地方中去。特别是，我们可以用从 8x8 样本中所学习到的特征跟原本的大尺寸图像作卷积，从而对这个大尺寸图像上的任一位置获得一个不同特征的激活值。

这里写图片描述

convolution移动是有重叠的。卷积神经网络中的卷积是处理的一块图像不是处理的一个像素，这样做法加强了图像信息的连续性，使得神经网络能够看到图形而非一个点，这种做法也有助于加深对图像理解。对于图像来说，单个的像素细粒度特征是没有意义的。而一块图像的特征可能包含更多的边缘信息，这样更加有助于对图像的理解。

层类型：Convolutionlr_mult: 学习率的系数，最终的学习率是这个数乘以solver.prototxt配置文件中的base_lr。如果有两个lr_mult, 则第一个表示权值的学习率，第二个表示偏置项的学习率。一般偏置项的学习率是权值学习率的两倍。在后面的convolution_param中，我们可以设定卷积层的特有参数。必须设置的参数：num_output: 卷积核（filter)的个数kernel_size: 卷积核的大小。如果卷积核的长和宽不等，需要用kernel_h和kernel_w分别设定其它参数：stride: 卷积核的步长，默认为1。也可以用stride_h和stride_w来设置。pad: 扩充边缘，默认为0，不扩充。 扩充的时候是左右、上下对称的，比如卷积核的大小为5*5，那么pad设置为2，则四个边缘都扩充2个像素，即宽度和高度都扩充了4个像素,这样卷积运算之后的特征图就不会变小。也可以通过pad_h和pad_w来分别设定。weight_filler: 权值初始化。 默认为“constant",值全为0，很多时候我们用"xavier"算法来进行初始化，也可以设置为”gaussian"bias_filler: 偏置项的初始化。一般设置为"constant",值全为0。bias_term: 是否开启偏置项，默认为true, 开启输入：n*c0*w0*h0
输出：n*c1*w1*h1
其中，c1就是参数中的num_output，生成的特征图个数w1=floor((w0+2*pad-kernel_size)/stride)+1;h1=floor((h0+2*pad-kernel_size)/stride)+1;
如果设置stride为1，前后两次卷积部分存在重叠。如果设置pad=(kernel_size-1)/2,则运算后，宽度和高度不变。
由pad, kernel_size和stride三者共同决定。

池化

在通过卷积获得了特征 (features) 之后，下一步我们希望利用这些特征去做分类。理论上讲，人们可以用所有提取得到的特征去训练分类器，例如 softmax 分类器，但这样做面临计算量的挑战。
例如：对于一个 96X96 像素的图像，假设我们已经学习得到了400个定义在8X8输入上的特征，每一个特征和图像卷积都会得到一个 (96 − 8 + 1) * (96 − 8 + 1) = 7921 维的卷积特征，由于有 400 个特征，所以每个样例 (example) 都会得到一个 89*89 * 400 = 3,168,400 维的卷积特征向量。学习一个拥有超过 3 百万特征输入的分类器十分不便，并且容易出现过拟合 (over-fitting)。而采用完全连接的网络输出只有100维。

使用卷积后的特征是因为图像具有一种“静态性”的属性，这也就意味着在一个图像区域有用的特征极有可能在另一个区域同样适用。因此，为了描述大的图像，一个很自然的想法就是对不同位置的特征进行聚合统计.

例如，人们可以计算图像一个区域上的某个特定特征的平均值 (或最大值)。这些概要统计特征不仅具有低得多的维度 (相比使用所有提取得到的特征)，同时还会改善结果(不容易过拟合)。这种聚合统计的操作就叫做池化 (pooling)，有时也称为均值池化 (mean pooling)，最大值池化 (max pooling)，随机池化 (stochastic pooling) (取决于计算池化的方法)。池化同时也通过统计区域信息，达到了一个降噪的目的，以及平移、旋转、放缩的不变性。

池化可以理解为理解成下采样（subsampling），池化是层层递进的，底层的池化是在模糊底层特征，如线条等，高层的池化模糊了高级语义特征，卷积池化交替出现，保证提取特征的同时也强制模糊增加特征的旋转不变性。

convolution得到的结果进行统计计算过程就叫做pooling。池化有一般池化（非重叠池化）、重叠池化、空金字塔池化。

下图显示池化如何应用于一个图像的四块不重合区域

这里写图片描述

池化的不变性

如果人们选择图像中的连续范围作为池化区域，并且只是池化相同(重复)的隐藏单元产生的特征，那么，这些池化单元就具有平移不变性 (translation invariant)。这就意味着即使图像经历了一个小的平移之后，依然会产生相同的 (池化的) 特征。

池化可以提供基本的平移、旋转不变性。最大化操作会提取出相同的值而不管你是否有一定程度内的平移或旋转。

一般池化（General Pooling)

池化作用于图像中不重合的区域（这与卷积操作不同），过程如下图。

这里写图片描述

我们定义池化窗口的大小为sizeX，即下图中红色正方形的边长，定义两个相邻池化窗口的水平位移/竖直位移为stride。一般池化由于每一池化窗口都是不重复的，所以sizeX=stride。

这里写图片描述

最常见的池化操作为平均池化mean pooling和最大池化max pooling：

平均池化：计算图像区域的平均值作为该区域池化后的值。
最大池化：选图像区域的最大值作为该区域池化后的值。

重叠池化（OverlappingPooling）

重叠池化正如其名字所说的，相邻池化窗口之间会有重叠区域，此时sizeX>stride。

论文 Krizhevsky, I. Sutskever, andG. Hinton, “Imagenet classification with deep convolutional neural networks,”in NIPS,2012.中，作者使用了重叠池化，其他的设置都不变的情况下， top-1和top-5 的错误率分别减少了0.4% 和0.3%。

空金字塔池化（Spatial Pyramid Pooling）

空间金字塔池化可以把任何尺度的图像的卷积特征转化成相同维度，这不仅可以让CNN处理任意尺度的图像，还能避免cropping和warping操作，导致一些信息的丢失，具有非常重要的意义。

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Su,Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,LSVRC-2014 contest

一般的CNN都需要输入图像的大小是固定的，这是因为全连接层的输入需要固定输入维度，但在卷积操作是没有对图像尺度有限制，所有作者提出了空间金字塔池化，先让图像进行卷积操作，然后转化成维度相同的特征输入到全连接层，这个可以把CNN扩展到任意大小的图像。

这里写图片描述

pooling层的运算方法基本是和卷积层是一样的。输入：n*c*w0*h0
输出：n*c*w1*h1
和卷积层的区别就是其中的c保持不变w1=floor((w0+2*pad-kernel_size)/stride)+1;h1=floor((h0+2*pad-kernel_size)/stride)+1;
如果设置stride为2，前后两次卷积部分不重叠。100*100的特征图池化后，变成50*50.
由pad, kernel_size和stride三者共同决定。

空间金字塔池化的思想来自于Spatial Pyramid Model，它一个pooling变成了多个scale的pooling。用不同大小池化窗口作用于卷积特征，我们可以得到1X1,2X2,4X4的池化结果，由于conv5中共有256个过滤器，所以得到1个256维的特征，4个256个特征，以及16个256维的特征，然后把这21个256维特征链接起来输入全连接层，通过这种方式把不同大小的图像转化成相同维度的特征。

这里写图片描述

对于不同的图像要得到相同大小的pooling结果，就需要根据图像的大小动态的计算池化窗口的大小和步长。假设conv5输出的大小为a*a，需要得到n*n大小的池化结果，可以让窗口大小sizeX为这里写图片描述，步长为。下图以conv5输出的大小为13*13为例。

这里写图片描述

SPP其实就是一种多个scale的pooling，可以获取图像中的多尺度信息；在CNN中加入SPP后，可以让CNN处理任意大小的输入，这让模型变得更加的flexible。

备注

convolution是为了解决前面无监督特征提取学习计算复杂度的问题，
而pooling方法是为了后面有监督特征分类器学习的，也是为了减小需要训练的系统参数。
也就是说我们采用无监督的方法提取目标的特征，而采用有监督的方法来训练分类器。

max pooling用来去掉卷积得到的Feature Map中的冗余信息,pooling是一种信息汇集,信息粗粒度化

实验步骤

1.初始化参数，加载上一节实验结果，即：10万张8*8的RGB小图像块中提取的颜色特征，并把特征可视化。
2.先加载8张64*64的图片（用来测试卷积和池化是否正确），再实现卷积函数cnnConvolve.m，并检查该函数是否正确。
3.实现池化函数cnnPool.m，并检查该函数是否正确。
4.加载2000张64*64RGB图片，利用前面实现的卷积函数从中提取出卷积特征convolvedFeaturesThis后，再利用池化函数从convolvedFeaturesThis中提取出池化特征pooledFeaturesTrain，把它作为softmax分类器的训练数据集；加载3200张64*64RGB图片，利用前面实现的卷积函数从中提取出卷积特征convolvedFeaturesThis后，再利用池化函数从convolvedFeaturesThis中提取出池化特征pooledFeaturesTest，把它作为softmax分类器的测试数据集。
5.用训练数据集pooledFeaturesTrain及其标签训练softmax分类器，得到模型参数softmaxModel。
6.利用训练过的模型参数为pooledFeaturesTest的softmax分类器对测试数据集pooledFeaturesTest进行分类，即得到3200张64*64RGB图片的分类结果。

cnnExercise.m

%% CS294A/CS294W Convolutional Neural Networks Exercise%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  convolutional neural networks exercise. In this exercise, you will only
%  need to modify cnnConvolve.m and cnnPool.m. You will not need to modify
%  this file.%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.imageDim = 64;         % image dimension
imageChannels = 3;     % number of channels (rgb, so 3)patchDim = 8;          % patch dimension
numPatches = 50000;    % number of patchesvisibleSize = patchDim * patchDim * imageChannels;  % number of input units 
outputSize = visibleSize;   % number of output units
hiddenSize = 400;           % number of hidden units epsilon = 0.1;         % epsilon for ZCA whiteningpoolDim = 19;          % dimension of pooling region%%======================================================================
%% STEP 1: Train a sparse autoencoder (with a linear decoder) to learn 
%  features from color patches. If you have completed the linear decoder
%  execise, use the features that you have obtained from that exercise, 
%  loading them into optTheta. Recall that we have to keep around the 
%  parameters used in whitening (i.e., the ZCA whitening matrix and the
%  meanPatch)% --------------------------- YOUR CODE HERE --------------------------
% Train the sparse autoencoder and fill the following variables with 
% the optimal parameters:optTheta =  zeros(2*hiddenSize*visibleSize+hiddenSize+visibleSize, 1);
ZCAWhite =  zeros(visibleSize, visibleSize);
meanPatch = zeros(visibleSize, 1);
%load STL10Features.mat;% --------------------------------------------------------------------% Display and check to see that the features look good
W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);
b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);displayColorNetwork( (W*ZCAWhite)');%%======================================================================
%% STEP 2: Implement and test convolution and pooling
%  In this step, you will implement convolution and pooling, and test them
%  on a small part of the data set to ensure that you have implemented
%  these two functions correctly. In the next step, you will actually
%  convolve and pool the features with the STL10 images.%% STEP 2a: Implement convolution
%  Implement convolution in the function cnnConvolve in cnnConvolve.m% Note that we have to preprocess the images in the exact same way 
% we preprocessed the patches before we can obtain the feature activations.load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels%% Use only the first 8 images for testing
convImages = trainImages(:, :, :, 1:8); % NOTE: Implement cnnConvolve in cnnConvolve.m first!w和b已经是矩阵或向量的形式了
convolvedFeatures = cnnConvolve(patchDim, hiddenSize, convImages, W, b, ZCAWhite, meanPatch);%% STEP 2b: Checking your convolution
%  To ensure that you have convolved the features correctly, we have
%  provided some code to compare the results of your convolution with
%  activations from the sparse autoencoder% For 1000 random points
for i = 1:1000    featureNum = randi([1, hiddenSize]);%随机选取一个特征imageNum = randi([1, 8]);%随机选取一个样本imageRow = randi([1, imageDim - patchDim + 1]);%随机选取一个点imageCol = randi([1, imageDim - patchDim + 1]);  %在那8张图片中随机选取1张图片，然后又根据随机选取的左上角点选取1个patchpatch = convImages(imageRow:imageRow + patchDim - 1, imageCol:imageCol + patchDim - 1, :, imageNum);patch = patch(:);            %这样是按照列的顺序来排列的     patch = patch - meanPatch;patch = ZCAWhite * patch;%用同样的参数对该patch进行白化处理features = feedForwardAutoencoder(optTheta, hiddenSize, visibleSize, patch); %计算出该patch的输出值if abs(features(featureNum, 1) - convolvedFeatures(featureNum, imageNum, imageRow, imageCol)) > 1e-9fprintf('Convolved feature does not match activation from autoencoder\n');fprintf('Feature Number    : %d\n', featureNum);fprintf('Image Number      : %d\n', imageNum);fprintf('Image Row         : %d\n', imageRow);fprintf('Image Column      : %d\n', imageCol);fprintf('Convolved feature : %0.5f\n', convolvedFeatures(featureNum, imageNum, imageRow, imageCol));fprintf('Sparse AE feature : %0.5f\n', features(featureNum, 1));       error('Convolved feature does not match activation from autoencoder');end 
enddisp('Congratulations! Your convolution code passed the test.');%% STEP 2c: Implement pooling
%  Implement pooling in the function cnnPool in cnnPool.m% NOTE: Implement cnnPool in cnnPool.m first!
pooledFeatures = cnnPool(poolDim, convolvedFeatures);%% STEP 2d: Checking your pooling
%  To ensure that you have implemented pooling, we will use your pooling
%  function to pool over a test matrix and check the results.
%将1~64这64个数字弄成一个矩阵，按列的方向依次递增
testMatrix = reshape(1:64, 8, 8);
%直接计算均值pooling值
expectedMatrix = [mean(mean(testMatrix(1:4, 1:4))) mean(mean(testMatrix(1:4, 5:8))); ...mean(mean(testMatrix(5:8, 1:4))) mean(mean(testMatrix(5:8, 5:8))); ];testMatrix = reshape(testMatrix, 1, 1, 8, 8);%squeeze去掉维度为1的那一维     
pooledFeatures = squeeze(cnnPool(4, testMatrix));%参数值为4表明是对4*4的区域进行poolingif ~isequal(pooledFeatures, expectedMatrix)disp('Pooling incorrect');disp('Expected');disp(expectedMatrix);disp('Got');disp(pooledFeatures);
elsedisp('Congratulations! Your pooling code passed the test.');
end%%======================================================================
%% STEP 3: Convolve and pool with the dataset
%  In this step, you will convolve each of the features you learned with
%  the full large images to obtain the convolved features. You will then
%  pool the convolved features to obtain the pooled features for
%  classification.
%
%  Because the convolved features matrix is very large, we will do the
%  convolution and pooling 50 features at a time to avoid running out of
%  memory. Reduce this number if necessarystepSize = 50;
assert(mod(hiddenSize, stepSize) == 0, 'stepSize should divide hiddenSize');%hiddenSize/stepSize为整数，这里分8次进行load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels
load stlTestSubset.mat  % loads numTestImages,  testImages,  testLabelspooledFeaturesTrain = zeros(hiddenSize, numTrainImages, ... %image是大图片的尺寸，这里为64floor((imageDim - patchDim + 1) / poolDim), ...  %.poolDim为多大的区域pool一次，这里为19，即19*19大小pool一次.floor((imageDim - patchDim + 1) / poolDim) );   %最后算出的pooledFeaturesTrain大小为400*2000*3*3
pooledFeaturesTest = zeros(hiddenSize, numTestImages, ...floor((imageDim - patchDim + 1) / poolDim), ...floor((imageDim - patchDim + 1) / poolDim) );  %pooledFeaturesTest大小为400*3200*3*3tic();for convPart = 1:(hiddenSize / stepSize)   %stepSize表示分批次进行原始图片数据的特征提取，一次进行stepSize个隐含层节点featureStart = (convPart - 1) * stepSize + 1;  %选取起始的特征featureEnd = convPart * stepSize;  %选取结束的特征fprintf('Step %d: features %d to %d\n', convPart, featureStart, featureEnd);  Wt = W(featureStart:featureEnd, :);bt = b(featureStart:featureEnd);    fprintf('Convolving and pooling train images\n');convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...  %参数2表示的是当前"隐含层"节点的个数trainImages, Wt, bt, ZCAWhite, meanPatch);pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;   toc();clear convolvedFeaturesThis pooledFeaturesThis;%这些大的变量在不用的情况下全部删除掉，因为后面用的是test部分fprintf('Convolving and pooling test images\n');convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...testImages, Wt, bt, ZCAWhite, meanPatch);pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);pooledFeaturesTest(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;   toc();clear convolvedFeaturesThis pooledFeaturesThis;end% You might want to save the pooled features since convolution and pooling takes a long time
save('cnnPooledFeatures.mat', 'pooledFeaturesTrain', 'pooledFeaturesTest');
toc();%%======================================================================
%% STEP 4: Use pooled features for classification
%  Now, you will use your pooled features to train a softmax classifier,
%  using softmaxTrain from the softmax exercise.
%  Training the softmax classifer for 1000 iterations should take less than
%  10 minutes.% Add the path to your softmax solution, if necessary
% addpath /path/to/solution/% Setup parameters for softmax
softmaxLambda = 1e-4;%权值惩罚系数
numClasses = 4;
% Reshape the pooledFeatures to form an input vector for softmax
softmaxX = permute(pooledFeaturesTrain, [1 3 4 2]);%permute是调整顺序，把图片放在最后
softmaxX = reshape(softmaxX, numel(pooledFeaturesTrain) / numTrainImages,...numTrainImages);                                         %为每一张图片得到的特征向量长度   
softmaxY = trainLabels;options = struct;
options.maxIter = 200;
softmaxModel = softmaxTrain(numel(pooledFeaturesTrain) / numTrainImages,...%第一个参数为inputSizenumClasses, softmaxLambda, softmaxX, softmaxY, options);%%======================================================================
%% STEP 5: Test classifer
%  Now you will test your trained classifer against the test imagessoftmaxX = permute(pooledFeaturesTest, [1 3 4 2]);
softmaxX = reshape(softmaxX, numel(pooledFeaturesTest) / numTestImages, numTestImages);
softmaxY = testLabels;[pred] = softmaxPredict(softmaxModel, softmaxX);
acc = (pred(:) == softmaxY(:));
acc = sum(acc) / size(acc, 1);
fprintf('Accuracy: %2.3f%%\n', acc * 100);%计算预测准确度% You should expect to get an accuracy of around 80% on the test images.

cnnConvolve.m

function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
%  patchDim - patch (feature) dimension
%  numFeatures - number of features
%  images - large images to convolve with, matrix in the form
%           images(r, c, channel, image number)
%  W, b - W, b for features from the sparse autoencoder
%  ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for
%                        preprocessing
%
% Returns:
%  convolvedFeatures - matrix of convolved features in the form
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)patchSize = patchDim*patchDim;
assert(numFeatures == size(W,1), 'W should have numFeatures rows');
numImages = size(images, 4);%第4维的大小，即图片的样本数
imageDim = size(images, 1);%第1维的大小,即图片的行数
imageChannels = size(images, 3);%第3维的大小，即图片的通道数
assert(patchSize*imageChannels == size(W,2), 'W should have patchSize*imageChannels cols');% Instructions:
%   Convolve every feature with every large image here to produce the 
%   numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1) 
%   matrix convolvedFeatures, such that 
%   convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the
%   value of the convolved featureNum feature for the imageNum image over
%   the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1)
%
% Expected running times: 
%   Convolving with 100 images should take less than 3 minutes 
%   Convolving with 5000 images should take around an hour
%   (So to save time when testing, you should convolve with less images, as
%   described earlier)% -------------------- YOUR CODE HERE --------------------
% Precompute the matrices that will be used during the convolution. Recall
% that you need to take into account the whitening and mean subtraction
% stepsWT = W*ZCAWhite;%等效的网络参数
b_mean = b - WT*meanPatch;%针对未均值化的输入数据需要加入该项% --------------------------------------------------------convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);
for imageNum = 1:numImagesfor featureNum = 1:numFeatures% convolution of image with feature matrix for each channelconvolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);for channel = 1:imageChannels% Obtain the feature (patchDim x patchDim) needed during the convolution% ---- YOUR CODE HERE ----offset = (channel-1)*patchSize;feature = reshape(WT(featureNum,offset+1:offset+patchSize), patchDim, patchDim);%取一个权值图像块出来im  = images(:,:,channel,imageNum);% Flip the feature matrix because of the definition of convolution, as explained laterfeature = flipud(fliplr(squeeze(feature)));% Obtain the imageim = squeeze(images(:, :, channel, imageNum));%取一张图片出来% Convolve "feature" with "im", adding the result to convolvedImage% be sure to do a 'valid' convolution% ---- YOUR CODE HERE ----convolvedoneChannel = conv2(im, feature, 'valid');convolvedImage = convolvedImage + convolvedoneChannel;%直接把3通道的值加起来，理由：3通道相当于有3个feature-map，类似于cnn第2层以后的输入。% ------------------------end% Subtract the bias unit (correcting for the mean subtraction as well)% Then, apply the sigmoid function to get the hidden activation% ---- YOUR CODE HERE ----convolvedImage = sigmoid(convolvedImage+b_mean(featureNum));% ------------------------% The convolved feature is the sum of the convolved values for all channelsconvolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;end
endendfunction sigm = sigmoid(x)sigm = 1./(1+exp(-x));
end

cnnPool.m

function pooledFeatures = cnnPool(poolDim, convolvedFeatures)
%cnnPool Pools the given convolved features
%
% Parameters:
%  poolDim - dimension of pooling region
%  convolvedFeatures - convolved features to pool (as given by cnnConvolve)
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)
%
% Returns:
%  pooledFeatures - matrix of pooled features in the form
%                   pooledFeatures(featureNum, imageNum, poolRow, poolCol)
%     numImages = size(convolvedFeatures, 2);%图片数
numFeatures = size(convolvedFeatures, 1);%特征数
convolvedDim = size(convolvedFeatures, 3);%图片的行数
resultDim  = floor(convolvedDim / poolDim);
pooledFeatures = zeros(numFeatures, numImages, resultDim, resultDim);% -------------------- YOUR CODE HERE --------------------
% Instructions:
%   Now pool the convolved features in regions of poolDim x poolDim,
%   to obtain the 
%   numFeatures x numImages x (convolvedDim/poolDim) x (convolvedDim/poolDim) 
%   matrix pooledFeatures, such that
%   pooledFeatures(featureNum, imageNum, poolRow, poolCol) is the 
%   value of the featureNum feature for the imageNum image pooled over the
%   corresponding (poolRow, poolCol) pooling region 
%   (see http://ufldl/wiki/index.php/Pooling )
%   
%   Use mean pooling here.
% -------------------- YOUR CODE HERE --------------------
for imageNum = 1:numImagesfor featureNum = 1:numFeaturesfor poolRow = 1:resultDimoffsetRow = 1+(poolRow-1)*poolDim;for poolCol = 1:resultDimoffsetCol = 1+(poolCol-1)*poolDim;patch = convolvedFeatures(featureNum,imageNum,offsetRow:offsetRow+poolDim-1,...offsetCol:offsetCol+poolDim-1);%取出一个patchpooledFeatures(featureNum,imageNum,poolRow,poolCol) = mean(patch(:));%使用均值poolendendend
endend