【TensorFlow】conv2d函数参数解释以及padding理解

卷积conv2d

CNN在深度学习中有着举足轻重的地位，主要用于特征提取。在TensorFlow中涉及的函数是tf.nn.conv2d。

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format=“NHWC”, dilations=[1, 1, 1, 1], name=None)

input 代表做卷积的输入图像的Tensor，其shape要求为[batch, in_height, in_width, in_channels]，具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，数据类型为float32或float64；
filter 相当于CNN中的卷积核，该Tensor的shape要求为[filter_height, filter_width, in_channels, out_channels]，具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同，filter的通道数要求与input的in_channels一致，即第三维in_channels，就是参数input的第四维；
strides [1,stride_h,stride_w,1]步长，即卷积核每次移动的步长；
padding string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式；

输出结果是shape为[batch, out_height, out_width, out_channels]，batch取决于input，out_channels取决于filter，而out_height与out_width取决于所有参数，参考示意图

SAME模式 补
- out_height = ceil ( float ( in_height ) / float ( stride_h) )
- out_width = ceil ( float ( in_width ) / float ( stride_w ) )
VALID模式 丢
- out_height = ceil(float(in_height - filter_height + 1) / float(stride_h))
- out_width = ceil(float(in_width - filter_width + 1) / float(stride_w))

补的方式如下:

补的行数：pad_along_height = max((out_height - 1) * strides[1] + filter_height - in_height, 0)
补的列数：pad_along_width = max((out_width - 1) * strides[2] + filter_width - in_width, 0)
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left

测试实例

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tfinput = tf.Variable(tf.random_normal([1,16,64,3]))
filter = tf.Variable(tf.random_normal([3,5,3,32]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='VALID')with tf.Session() as sess:sess.run(tf.global_variables_initializer())res = (sess.run(op))print (res.shape)# (1, 7, 30, 32)

空洞卷积atrous_conv2d

tf.nn.atrous_conv2d(value, filters, rate, padding, name=None)

input 代表做卷积的输入图像的Tensor，其shape要求为[batch, in_height, in_width, in_channels]，具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，数据类型为float32或float64；
filter 相当于CNN中的卷积核，该Tensor的shape要求为[filter_height, filter_width, in_channels, out_channels]，具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同，filter的通道数要求与input的in_channels一致，即第三维in_channels，就是参数input的第四维；
rate 要求是一个int型的正数，正常的卷积操作应该会有stride（即卷积核的滑动步长），但是空洞卷积是没有stride参数的，这一点尤其要注意。取而代之，它使用了新的rate参数，那么rate参数有什么用呢？它定义为我们在输入图像上卷积时的采样间隔，你可以理解为卷积核当中穿插了（rate-1）数量的“0”，把原来的卷积核插出了很多“洞洞”，这样做卷积时就相当于对原图像的采样间隔变大了。具体怎么插得，可以看后面更加详细的描述。此时我们很容易得出rate=1时，就没有0插入，此时这个函数就变成了普通卷积;
padding 填充模式，取值只能为“SAME”或“VALID”；

A positive int32. The stride with which we sample input values across the height and width dimensions. Equivalently, the rate by which we upsample the filter values by inserting zeros across the height and width dimensions. In the literature, the same parameter is sometimes called input stride or dilation.

输出shape为

VALID
[batch, height - 2 * (filter_width - 1), width - 2 * (filter_height - 1), out_channels].
SAME
[batch, height, width, out_channels].

深入理解:

参考:【Tensorflow】tf.nn.atrous_conv2d如何实现空洞卷积？的示意图
查看函数说明，其中有优化部分
延伸阅读:
对于xception非常好的理解
【Tensorflow】tf.nn.depthwise_conv2d如何实现深度卷积?
【Tensorflow】tf.nn.separable_conv2d如何实现深度可分卷积?

反卷积conv2d_transpose

反卷积操作是卷积的反向

tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding=‘SAME’, data_format=‘NHWC’, name=None)

value 指需要做反卷积的输入图像，它要求是一个Tensor
filter 卷积核，它要求是一个Tensor，具有[filter_height, filter_width, out_channels, in_channels]这样的shape，具体含义是[卷积核的高度，卷积核的宽度，卷积核个数，图像通道数]
output_shape 反卷积操作输出的shape
strides 反卷积时在图像每一维的步长，这是一个一维的向量，长度4
padding string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式
data_format string类型的量，'NHWC’和’NCHW’其中之一，这是tensorflow新版本中新加的参数，它说明了value参数的数据格式。'NHWC’指tensorflow标准的数据格式[batch, height, width, in_channels]，‘NCHW’指Theano的数据格式,[batch, in_channels，height, width]，当然默认值是’NHWC’

在这里解释一下output_shape这个参数，反卷积操作是卷积的反向，在卷积计算中输出维度是上取整，那么这就说明输入维度可能是多个，所以在反卷积计算时需要用户给出这个输入维度。

后续学习资料

卷积神经网络CNN经典模型整理Lenet，Alexnet，Googlenet，VGG，Deep Residual Learning
ResNet解析
ResNetV2：ResNet深度解析

import osos.environ["CUDA_VISIBLE_DEVICES"] = "-1"import tensorflow as tf
tf.enable_eager_execution()tf.set_random_seed(1234)def conv2d_same(inputs, kernel, stride,rate=1):if stride == 1:return tf.nn.conv2d(inputs, kernel, [1, stride, stride, 1], padding='SAME')else:kernel_size = kernel.shape.as_list()[0]kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)pad_total = kernel_size_effective - 1pad_beg = pad_total // 2pad_end = pad_total - pad_begpaddings = [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]]print(paddings)inputs_ = tf.pad(inputs, paddings)print(tf.squeeze(inputs_))same_conv = tf.nn.conv2d(inputs_, kernel, [1, stride, stride, 1], padding='VALID')print("=============conv2d_same=============")print(tf.squeeze(same_conv))def conv2d_fun(inputs, kernel, stride, padding):conv = tf.nn.conv2d(inputs, kernel, [1, stride, stride, 1], padding=padding)print("============" + padding + "==" + "S("+str(stride)+")" + "============")print(tf.squeeze(conv))def test(k_size):src = tf.random_uniform((1, k_size, k_size, 1), 0, 5, tf.int32, seed=0)src = tf.cast(src, tf.float32)print("=============inputs{}==============".format(src.shape.as_list()))print(tf.squeeze(src))kernel = tf.random_uniform((3, 3, 1, 1), -1, 2, tf.int32, seed=0)kernel = tf.cast(kernel, tf.float32)print("=============inputs{}==============".format(kernel.shape.as_list()))print(tf.squeeze(kernel))conv2d_fun(src, kernel, 1, "SAME")conv2d_fun(src, kernel, 2, "SAME")conv2d_fun(src, kernel, 1, "VALID")conv2d_fun(src, kernel, 2, "VALID")conv2d_same(src, kernel, 2)if __name__ == "__main__":test(7)print("=" * 80)test(8)

参考

tensorflow conv2d的padding解释以及参数解释
conv2d函数的padding参数解释
TensorFlow中CNN的两种padding方式“SAME”和“VALID”
What is the difference between ‘SAME’ and ‘VALID’ padding in tf.nn.max_pool of tensorflow?
tf.nn.dynamic_rnn的输出outputs和state含义