Pytorch 复习总结 2

Pytorch 复习总结，仅供笔者使用，参考教材：

《动手学深度学习》
Stanford University: Practical Machine Learning

本文主要内容为：Pytorch 线性神经网络。

本文以机器学习中的两大基本问题 —— 回归和分类为例，介绍线性神经网络的用法。

Pytorch 语法汇总：

Pytorch 张量的常见运算、线性代数、高等数学、概率论部分见 Pytorch 复习总结1；
Pytorch 线性神经网络部分见 Pytorch 复习总结2；
Pytorch 多层感知机部分见 Pytorch 复习总结3；
Pytorch 深度学习计算部分见 Pytorch 复习总结4；
Pytorch 卷积神经网络部分见 Pytorch 复习总结5；
Pytorch 现代卷积神经网络部分见 Pytorch 复习总结6；

一. 线性回归

线性回归模型通过建立自变量与因变量之间的线性关系，来解决回归预测问题。涉及数据迭代器、神经网络层、损失函数、优化器等，以 $y = Xw + b$ 为例。

1. 读取 / 生成数据集

线性回归模型的样本数据包括特征值 $X$ 和标签值 $y$ 。如果数据存储在 .csv 等文件中，可以直接读取：

import csv
import torchdef load_csv_data(filename):features = []labels = []with open(filename, 'r') as csvfile:csv_reader = csv.reader(csvfile)for row in csv_reader:features.append(list(map(float, row[:-1])))labels.append(float(row[-1]))X = torch.tensor(features, dtype=torch.float32)y = torch.tensor(labels, dtype=torch.float32).reshape(-1, 1)return X, yfilename = "path_to_dataset.csv"
features, labels = load_csv_data(filename)

如果想要随机生成数据，可以在特定线性模型上加噪：

def synthetic_data(w, b, num_examples):"""生成y=Xw+b+噪声"""X = torch.normal(0, 1, (num_examples, len(w)))y = torch.matmul(X, w) + by += torch.normal(0, 0.01, y.shape)     # 加噪return X, y.reshape((-1, 1))true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

2. 数据迭代器

数据迭代器用于迭代访问数据集中的样本，可以将整个数据集按照指定的批量大小划分成小批量。并且在每个 epoch 开始之前重新打乱数据集的样本顺序，然后使用 next() 函数获取下一小批量样本：

from torch.utils import datadef load_array(data_arrays, batch_size, is_train=True):"""构造一个PyTorch数据迭代器"""dataset = data.TensorDataset(*data_arrays)return data.DataLoader(dataset, batch_size, shuffle=is_train)batch_size = 10
data_iter = load_array((features, labels), batch_size)
# print(next(iter(data_iter)))

上述示例中，数据迭代器将 1000 组训练数据分成 100 批，每批包含 10 组训练数据。

3. 神经网络模型

神经网络模型可以使用顺序模型容器 nn.Sequential() 将多个网络层按顺序连接：

from torch import nnnet = nn.Sequential(nn.Linear(2, 64),nn.ReLU(),nn.Linear(64, 32),nn.ReLU(),nn.Linear(32, 1)
)

神经网络的线性层（也叫全连接层）初始化时需要分别初始化权重和偏置：

net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)
net[2].weight.data.normal_(0, 0.01)
net[2].bias.data.fill_(0)
net[4].weight.data.normal_(0, 0.01)
net[4].bias.data.fill_(0)

网络层还可以添加卷积层、循环层、批量归一化层：

# 卷积层
conv_layer = nn.Conv2d(2, 32, 3)
conv_layer.weight.data.normal_(0, 0.01)
conv_layer.bias.data.fill_(0)

# 循环层
lstm_layer = nn.LSTM(2, 16)
lstm_layer.weight_ih_l0.data.normal_(0, 0.01)   # 输入到隐藏层的权重
lstm_layer.weight_hh_l0.data.normal_(0, 0.01)   # 隐藏层到隐藏层的权重
lstm_layer.bias_ih_l0.data.fill_(0)             # 输入到隐藏层的偏置
lstm_layer.bias_hh_l0.data.fill_(0)             # 隐藏层到隐藏层的偏置

# 批量归一化层
batchnorm_layer = nn.BatchNorm2d(8)
batchnorm_layer.weight.data.fill_(1)
batchnorm_layer.bias.data.fill_(0)

4. 损失函数

训练神经网络模型时，通常会将模型的输出与真实标签计算损失值，并通过反向传播算法来更新模型的参数，以最小化损失函数。以均方误差为例：

loss = nn.MSELoss()

数学表达式如下，这里除以 2 是为了求导后的简洁美观：
$L(\mathbf{w}, b)=\frac{1}{n} \sum_{i=1}^n \frac{1}{2}\left(\mathbf{w}^{\top} \mathbf{x}^{(i)}+b-y^{(i)}\right)^2$

除了均方误差，还可以使用交叉熵损失函数 nn.CrossEntropyLoss、二元交叉熵损失函数 nn.BCELoss、多标签二元交叉熵损失函数 nn.BCEWithLogitsLoss、KL 散度损失函数 nn.KLDivLoss、Hinge 损失函数 nn.HingeEmbeddingLoss、Triplet 损失函数 nn.TripletMarginLoss、余弦相似度损失函数 nn.CosineEmbeddingLoss 等。

5. 优化器

训练神经网络模型时，使用优化器根据损失函数的梯度信息来更新模型的参数，可以使用 net.parameters() 获取模型所以参数。一般都会使用随机梯度下降进行优化：

trainer = torch.optim.SGD(net.parameters(), lr=0.03)

使用优化器更新模型参数时，一般都会使用 小批量随机梯度下降 (minibatch stochastic gradient descent, mini-batch SGD)。更新梯度时，将梯度乘以学习率 $\eta$ ，再除以批量大小 $|\mathcal{B}|$ ，然后从当前参数的值中减掉：
$(\mathbf{w}, b) \leftarrow(\mathbf{w}, b)-\frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_{(\mathbf{w}, b)} l^{(i)}(\mathbf{w}, b)$

以线性回归的损失函数 $L(\mathbf{w}, b)$ 为例，将 $\mathbf{w}$ 和 $b$ 的优化更新表达式展开如下：
$\mathbf{w} \leftarrow \mathbf{w}-\frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_{\mathbf{w}} l^{(i)}(\mathbf{w}, b)=\mathbf{w}-\frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \mathbf{x}^{(i)}\left(\mathbf{w}^{\top} \mathbf{x}^{(i)}+b-y^{(i)}\right)\\ b \leftarrow b-\frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_b l^{(i)}(\mathbf{w}, b)=b-\frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}}\left(\mathbf{w}^{\top} \mathbf{x}^{(i)}+b-y^{(i)}\right)$

6. 训练

在训练的每一轮 epoch 迭代中，遍历数据迭代器划分的小批量训练数据，每次获取一个小批量的输入和相应的标签。对于每一个小批量，使用网络模型预测标签并计算损失，然后反向传播来计算梯度，最后通过调用优化器来更新模型参数：

num_epochs = 3
for epoch in range(num_epochs):     # 迭代训练轮次for X, y in data_iter:          # 迭代小批量l = loss(net(X) ,y)trainer.zero_grad()         # 清除优化器中的梯度信息l.backward()                # 根据损失自动计算梯度trainer.step()              # 根据梯度信息和学习率更新模型参数l = loss(net(features), labels)print(f'epoch {epoch + 1}, loss {l:f}')

二. Softmax 回归

Softmax 回归是一种多类别分类模型。在 Softmax 回归模型中，计算输入特征对应的每个类别的分数，然后通过 Softmax 函数将这些分数转换为对应每个类别的概率值。

1. 读取数据集

以 Fashion-MNIST 数据集为例，读取 Fashion-MNIST 数据集并将其加载到内存中，返回训练集和验证集的数据迭代器：

import torch
import torchvision
from torch.utils import data
from torchvision import transformsdef load_data_fashion_mnist(batch_size, resize=None):"""下载Fashion-MNIST数据集并将其加载到内存中"""trans = [transforms.ToTensor()]if resize:trans.insert(0, transforms.Resize(resize))trans = transforms.Compose(trans)mnist_train = torchvision.datasets.FashionMNIST(root="./data", train=True, transform=trans, download=True)mnist_test = torchvision.datasets.FashionMNIST(root="./data", train=False, transform=trans, download=True)return (data.DataLoader(mnist_train, batch_size, shuffle=True),data.DataLoader(mnist_test, batch_size, shuffle=False))batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)
# for X, y in train_iter:
#     print(X.shape, X.dtype, y.shape, y.dtype)

2. 神经网络模型

Softmax 回归的输出层是一个全连接层，因此在 Sequential 中添加一个带有 10 个输出的全连接层：

from torch import nn
net = nn.Sequential(nn.Flatten(),nn.Linear(784, 10)
)

这里使用展平层 nn.Flatten() 将输入的图片展平为一维张量，以适应后面全连接层的输入要求。因为 Fashion-MNIST 中每张图片的大小为 28×28，所以展平层的输出维度为 784，所以全连接层的输入维度也是 784。

然后需要初始化模型，将 init_weights() 函数应用到模型的每个子模块上，按正态分布初始化所有全连接层的权重：

def init_weights(m):if type(m) == nn.Linear:nn.init.normal_(m.weight, std=0.01)net.apply(init_weights)

3. 损失函数

loss = nn.CrossEntropyLoss(reduction='none')

4. 优化器

trainer = torch.optim.SGD(net.parameters(), lr=0.1)

5. 训练

为了计算模型的分类精度，事先定义 accuracy(y_hat, y) 函数计算网络的预测标签 y_hat 的正确预测数量。y_hat.argmax(axis=1) 返回预测标签 y_hat 每一行最大值所在的索引，然后类型转换后与真实标签 y 比较得到布尔型数组，相加即可得到正确预测数量：

def accuracy(y_hat, y):"""计算预测正确的数量"""if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:y_hat = y_hat.argmax(axis=1)cmp = y_hat.type(y.dtype) == yreturn float(cmp.type(y.dtype).sum())

然后就可以训练：

num_epochs = 10for epoch in range(num_epochs):     # 迭代训练轮次net.train()                     # 将模型设置为训练模式train_loss_sum = 0.0            # 训练损失总和train_acc_sum = 0.0             # 训练准确度总和sample_num = 0                  # 样本数for X, y in train_iter:y_hat = net(X)l = loss(y_hat, y)trainer.zero_grad()l.mean().backward()trainer.step()train_loss_sum += l.sum()train_acc_sum += accuracy(y_hat, y)sample_num += y.numel()train_loss = train_loss_sum / sample_numtrain_acc = train_acc_sum / sample_numnet.eval()                      # 将模型设置为评估模式test_acc_sum = 0.0test_sample_num = 0for X, y in test_iter:test_acc_sum += accuracy(net(X), y)test_sample_num += y.numel()test_acc = test_acc_sum / test_sample_numprint(f'epoch {epoch + 1}, 'f'train loss {train_loss:.4f}, train acc {train_acc:.4f}, 'f'test acc {test_acc:.4f}')