Pytorch 复习总结 1

Pytorch 复习总结，仅供笔者使用，参考教材：

《动手学深度学习》

本文主要内容为：Pytorch 张量的常见运算、线性代数、高等数学、概率论。

Pytorch 张量的常见运算、线性代数、高等数学、概率论部分见 Pytorch 复习总结 1；
Pytorch 线性神经网络部分见 Pytorch 复习总结 2；
Pytorch 多层感知机部分见 Pytorch 复习总结 3；
Pytorch 深度学习计算部分见 Pytorch 复习总结 4；
Pytorch 卷积神经网络部分见 Pytorch 复习总结 5；
Pytorch 现代卷积神经网络部分见 Pytorch 复习总结 6；

一. 数据操作

张量 tensor 是 PyTorch 中的核心数据结构，类似于 Numpy 中的数组 ndarray。张量的本质是 n 维数组，可以很好地支持 GPU 加速计算，并且支持自动微分。使用张量需要导入头文件 torch，type 类型为 torch.Tensor。

1. 张量的创建

torch.arange(start=0, end, step=1)：创建等差数列的行向量；

import torch
x = torch.arange(6)         # tensor([0, 1, 2, 3, 4, 5])
y = torch.arange(1,6,2)     # tensor([1, 3, 5])

torch.zeros((a, b, ...)) / torch.ones((a, b, ...)) / torch.randn(a, b, ...) / torch.tensor([...])：创建元素全为 0 / 1 / 随机 / 指定的张量；

import torch
a = torch.zeros((2, 3, 4))
b = torch.ones((5))
c = torch.randn(3, 4)
d = torch.tensor([[[2, 1], [4, 3]], [[1, 2], [3, 4]], [[4, 3], [2, 1]]])

torch.zeros_like(x)：创建与 x 形状相同的全零张量；

import torch
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
y = torch.zeros_like(x)     # tensor([[0, 0, 0], [0, 0, 0]])

张量中的数据类型可以通过 dtype 属性指定：

类型	说明
torch.float64	双精度浮点数
torch.float32	单精度浮点数
torch.float16	半精度浮点数
torch.int64	64 位有符号整数
torch.int32	32 位有符号整数
torch.int16	16 位有符号整数
torch.int8	8 位有符号整数
torch.uint8	8 位无符号整数

2. 张量的基本操作

x.shape / x.numel()：返回张量的形状 / 元素总数;

import torch
x = torch.randn(3, 4)
print(x.shape)      # torch.Size([3, 4])
print(x.numel())    # 12

x.reshape(a, b)：改变原有张量的形状并返回新的张量，可以用 -1 自动计算某一维度的维数；

import torch
x = torch.arange(12)
y = x.reshape(3, 4)
print(y.shape)      # torch.Size([3, 4])
z = x.reshape(2, 3, -1)
print(z.shape)      # torch.Size([2, 3, 2])

torch.cat((a, b), dim=n)：将张量沿第 i 个轴拼接；

import torch
a = torch.arange(12, dtype=torch.float32).reshape((3,4))
b = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
x = torch.cat((a, b), dim=0)    # tensor([[ 0.,  1.,  2.,  3.], [ 4.,  5.,  6.,  7.], [ 8.,  9., 10., 11.], [ 2.,  1.,  4.,  3.], [ 1.,  2.,  3.,  4.], [ 4.,  3.,  2.,  1.]])
y = torch.cat((a, b), dim=1)    # tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.], [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.], [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])

x.clone()：张量的深拷贝（ = 这是浅拷贝，两个张量共享同一内存地址）；

3. 按元素运算

+ / - / * / / / ** / %：按元素加 / 减 / 乘 / 除 / 幂 / 模；

import torch
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
a = x + y   # tensor([ 3.,  4.,  6., 10.])
b = x - y   # tensor([-1.,  0.,  2.,  6.])
c = x * y   # tensor([ 2.,  4.,  8., 16.])
d = x / y   # tensor([0.5000, 1.0000, 2.0000, 4.0000])
e = x ** y  # tensor([ 1.,  4., 16., 64.])
f = x % y   # tensor([1., 0., 0., 0.])

torch.sin(x) / torch.cos(x) / torch.tan(x) / torch.sinh(x) / torch.cosh(x) / torch.tanh(x)：按元素计算三角函数；
torch.exp(x) / torch.log(x)：按元素计算指数 / 对数函数；
torch.logical_and(a, b)) / torch.logical_or(a, b)) / torch.logical_not(a)：按元素逻辑与 / 或 / 非；
>、<、=、torch.eq(a, b) / torch.gt(a, b) / torch.lt(a, b)) / torch.ge(a, b) / torch.le(a, b)：按元素比较；

上面按元素运算都是在相同形状的两个张量上执行的，如果两个形状不同的张量调用按元素运算操作，会按 广播机制 执行。广播机制会先适当复制元素将两个张量补全成相同形状，再按元素操作：

import torcha = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
print(a+b)      # tensor([[0, 1], [1, 2], [2, 3]])

4. 原地运算

上一节介绍的运算都会为返回的张量分配新的内存，可以通过 id() 函数检查变量的内存地址：

import torchx = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
print(id(x))    # 2479607808944
x = x + y
print(id(x))    # 2479608211632

如果是在深度学习训练等场景中，参数会被不断更新，这样重复的操作会导致大量内存的无效占用。这时可以 执行原地操作，如 Z[:] = <expression> 或 X+=Y：

import torchx = torch.tensor([1, 2, 4, 8])
y = torch.tensor([1, 3, 5, 7])
z = torch.zeros_like(x)
print(id(x))    # 2049176339296
print(id(z))    # 2049176741984
z[:] = x + y
print(id(z))    # 2049176741984
x+=y
print(id(x))    # 2049176339296

5. 索引和切片

Pytorch 中的索引和切片和 Python 数组中操作一样：

import torchx = torch.arange(12).reshape((3,4))
print(x[-1])        # tensor([ 8,  9, 10, 11])
print(x[1:3])       # tensor([[ 4,  5,  6,  7], [ 8,  9, 10, 11]])
print(x[:, 1:3])    # tensor([[ 1,  2], [ 5,  6], [ 9, 10]])
x[2, 2] = 15
x[0:2, :] = 12
print(x)            # tensor([[12, 12, 12, 12], [12, 12, 12, 12], [ 8,  9, 15, 11]])

6. 数据类型转换

张量可以通过 torch.tensor()、.numpy()、.tolist() 函数实现与 list、ndarray 的相互转化：

import numpy as np
import torchnumpy_array = np.array([[1, 2, 3], [4, 5, 6]])
tensor = torch.tensor([[7, 8, 9], [10, 11, 12]])
python_list = [[1, 2, 3], [4, 5, 6]]tensor_from_numpy = torch.tensor(numpy_array)
numpy_from_tensor = tensor.numpy()
tensor_from_list = torch.tensor(python_list)
list_from_tensor = tensor.tolist()

当张量 只含单个元素 时，还可以使用 .item() 函数将其转化为标量：

import torchx = torch.tensor([3])
n = x.item()		# 3

二. 数学运算

1. 线性代数

A.T：矩阵转置，转置矩阵与原矩阵 共享内存空间；
A*B：Hadamard 积，即按元素乘，记为 $\odot B$ ；
torch.dot(x, y)：列向量点积，记为 $\langle x, y\rangle$ 或 $x^Ty$ ；
torch.mv(A, x)：矩阵-向量积，记为 $A x$ ；
torch.mm(A, B)：矩阵-矩阵积，记为 $A B$ ；

torch.abs(u).sum() / torch.norm(u)：矩阵或向量的

L_1

L_2

范数；

import torch
x = torch.tensor([3.0, -4.0])
y = torch.tensor([[1, 2], [2, 3], [3, 4]], dtype=torch.float32)
print(torch.abs(x).sum())   # tensor(7.)
print(torch.norm(x))        # tensor(5.)
print(torch.abs(y).sum())   # tensor(15.)
print(torch.norm(y))        # tensor(6.5574)

2. 高等数学

如果想要计算张量的梯度，在创建张量的时候需要 将 requires_grad 属性设置为 True。然后对张量的函数值调用 .backward() 方法计算张量的梯度（即偏导数），张量的梯度与原张量具有相同形状。.backward() 方法的可选属性如下：

gradient 属性可以指定梯度的初始值，一般用于深度学习训练中梯度控制；
retain_graph 属性可以保留计算图以供后续的反向传播使用，以节省计算资源和时间。如果调用 .backward() 方法时不设置 retain_graph=True，多次使用同一个计算图进行反向传播时就会出现 RuntimeError，因为计算图已经被释放；
create_graph 属性在可以指定计算梯度的同时创建计算图，以计算高阶导数时使用；

调用 .backward() 方法后，PyTorch 会自动 计算张量的梯度，并将梯度存储在张量的 .grad 属性中。如果想要继续计算高阶导数，需要清空 .grad 属性值：x.grad = None 或 x.grad.zero_()。

张量的一阶导数：

import torchx = torch.tensor(2.0, requires_grad=True)
y = x**2
y.backward()
print("Gradient of x:", x.grad.item())  # Gradient of x: 4.0
print("Type of x.grad:", type(x.grad)) 	# Type of x.grad: <class 'torch.Tensor'>

张量的二阶导数：

import torchx = torch.tensor(2.0, requires_grad=True)
y = x**3
y.backward(create_graph=True)       # 计算一阶导数
first_derivative = x.grad.clone()   # 获取一阶导数值
x.grad = None                       # 清空一阶导数值，以便存储二阶导数
first_derivative.backward()         # 计算二阶导数
second_derivative = x.grad          # 获取二阶导数值
print("Second derivative of x:", second_derivative.item())  # Second derivative of x: 12.0

然而，将 .backward() 函数的 create_graph 属性设置为 True 可能会导致内存泄漏。为了避免这种情况，创建计算图时经常使用 autograd.grad() 函数：

import torchx = torch.tensor(2.0, requires_grad=True)
y = x**4
first_derivative = torch.autograd.grad(y, x, create_graph=True)[0]  # 计算一阶导数
second_derivative = torch.autograd.grad(first_derivative, x)[0]     # 计算二阶导数
print("First derivative of x:", first_derivative.item())            # First derivative of x: 32.0
print("Second derivative of x:", second_derivative.item())          # Second derivative of x: 48.0

需要注意的是，求导的目标函数必须是标量，否则无法隐式创建梯度。

3. 概率论

聚合函数：使用 torch.sum(x) / torch.mean(x) / torch.max(x) / torch.min(x) / torch.std(x) / torch.var(x) 计算张量元素的和 / 均值 / 最大值 / 最小值 / 标准差 / 方差，用法同 x.sum() / x.mean() / x.max() / x.min() / x.std() / x.var()。聚合函数可以通过 axis 指定聚合的维度：

import torch
x = torch.tensor([[[1, 2], [2, 3], [3, 4]], [[4, 5], [5, 6], [6, 7]]], dtype=torch.float32)
a = x.sum()                 # tensor(48.)
b = x.sum(axis=0)           # tensor([[ 5.,  7.], [ 7.,  9.], [ 9., 11.]])
c = x.sum(axis=1)           # tensor([[ 6.,  9.], [15., 18.]])
d = x.sum(axis=2)           # tensor([[ 3.,  5.,  7.], [ 9., 11., 13.]])
e = x.sum(axis=[0, 1])      # tensor([21., 27.])

还可以通过 keepdims 属性保持轴数不变：

import torch
x = torch.tensor([[[1, 2], [2, 3], [3, 4]], [[4, 5], [5, 6], [6, 7]]], dtype=torch.float32)
a = x.sum(axis=1, keepdims=True)
print(a)
'''
tensor([[[ 6.,  9.]],[[15., 18.]]])
'''

采样函数：可以使用 .sample() 函数对 torch.distributions 模块中的各种概率分布对象进行采样，常见分布如下：
- torch.distributions.Categorical：分类分布，是一种离散型随机变量分布，用于描述随机变量取每个类别值的概率；
```
import torch
import torch.distributions as dist
probs = torch.tensor([0.2, 0.3, 0.5])
categorical_dist = dist.Categorical(probs)
sample = categorical_dist.sample((20,))
```
- torch.distributions.Bernoulli：伯努利分布，也叫两点分布，是一种离散型随机变量分布，用于描述二值变量取每个值的概率；
```
import torch.distributions as dist
p = 0.6
bernoulli_dist = dist.Bernoulli(p)
sample = bernoulli_dist.sample((20,))
```
- torch.distributions.Multinomial：多项式分布，是一种离散型随机变量分布，是二项分布的一般形式；
```
import torch
import torch.distributions as dist
n = 10
probs = torch.tensor([0.1, 0.2, 0.7])
multinomial_dist = dist.Multinomial(n, probs)
sample = multinomial_dist.sample()
```
- torch.distributions.Poisson：泊松分布，是一种离散型随机变量分布，用于描述随机变量在固定间隔中发生次数的概率；
```
import torch.distributions as dist
lam = 3
poisson_dist = dist.Poisson(lam)
sample = poisson_dist.sample((20,))
```
- torch.distributions.Uniform：均匀分布，是一种连续型随机变量分布，用于描述随机变量在连续区间上均匀取值的情况；
```
import torch.distributions as dist
a, b = 3, 5
uniform_dist = dist.Uniform(a, b)
sample = uniform_dist.sample((5,))
```
- torch.distributions.Exponential：指数分布，是一种连续型随机变量分布，是 Gamma 分布形状参数为 1 时的特例；
```
import torch.distributions as dist
lam = 3
exponential_dist = dist.Exponential(lam)
sample = exponential_dist.sample((20,))
```
- torch.distributions.Normal：正态分布，是一种连续型随机变量分布，是最常见最一般的分布；
```
import torch.distributions as dist
mu = 0
sigma = 1
normal_dist = dist.Normal(mu, sigma)
sample = normal_dist.sample((20,5))
```
- torch.distributions.Gamma：Gamma 分布，是一种连续型随机变量分布，用于描述某一事件发生的等待时间；
```
import torch.distributions as dist
gamma_dist = dist.Gamma(2, 1)
sample = gamma_dist.sample((20,))
```
- torch.distributions.Beta：Beta 分布，是一种连续型随机变量分布，用于描述随机变量在有界区间 [0, 1] 上的取值情况；
```
import torch.distributions as dist
alpha, beta = 2, 5
beta_dist = dist.Beta(alpha, beta)
sample = beta_dist.sample((5,))
```
.sample() 函数内可以加入采样张量的维度，为空则默认只采样一次，如果是一维张量，需要在维度后加上逗号，如：.sample((20,))。

想要查找模块中更多的函数和类，可以调用 dir() 函数：

import torch
print(dir(torch.distributions))

想要查找特定函数和类的用法，可以调用 help() 函数：

import torch
help(torch.ones)