本文主要参考沐神的视频教程 https://www.bilibili.com/video/BV1UK4y1o7dy/vd_source=c7bfc6ce0ea0cbe43aa288ba2713e56d
文档教程 https://zh-v2.d2l.ai/
本文的主要内容对沐神提供的代码中个人不太理解的内容进行笔记记录,内容不会特别严谨仅供参考。
1.函数目录
1.1 python
python | 位置 |
---|---|
Lambda 函数 | 3.1 |
1.2 torch
python | 位置 |
---|---|
torch.utils.data.TensorDataset | 3.1 |
torch.optim.SGD | 3.6 |
torch.nn.MSELoss | 3.6 |
torch.Tensor.uniform_ | 4.1 |
torch.nn.Dropout | 4.2 |
2. 权重衰退
控制模型的容量的主要方法:
- 模型参数比较少
- 每个参数值选择的范围比较小
权重衰退就是控制参数值的大小。
2.1 使用均方范数作为硬性限制
- 通过限制参数值的选择范围来控制模型容量
m i n l ( w , b ) s u b j e c t t o ∣ ∣ w ∣ ∣ 2 ≤ θ min\ l(w,b) \quad subject\ to\ ||w||^2≤\theta min l(w,b)subject to ∣∣w∣∣2≤θ - 通常不限制偏移b(限不限制都差不多)
- 小的 θ \theta θ意味着更强的正则化
- 正则化是用来防止模型过拟合而采取的手段
2.2 使用均方范数作为柔性限制
- 对于每个 θ \theta θ,都可以找到 λ \lambda λ使得之前的目标函数等价于下面
m i n l ( w , b ) + λ 2 ∣ ∣ w ∣ ∣ 2 min\ l(w,b) + \frac{\lambda}{2}||w||^2 min l(w,b)+2λ∣∣w∣∣2 - 超参数 λ \lambda λ控制了正则项的重要程度
- λ = 0 : \lambda=0: λ=0:无作用
- λ → + ∞ , w ∗ → 0 \lambda\to +\infty,\ w^*\to0 λ→+∞, w∗→0
2.3 参数更新法则
- 计算梯度
d d w ( l ( w , b ) + λ 2 ∣ ∣ w ∣ ∣ 2 ) = d l ( w , b ) d w + λ w \frac{d}{dw}(l(w,b)+\frac{\lambda}{2}||w||^2)=\frac{dl(w,b)}{dw}+\lambda w dwd(l(w,b)+2λ∣∣w∣∣2)=dwdl(w,b)+λw
w t + 1 = w t − η λ w t − η d l ( w , b ) d w w_{t+1}=w_t-\eta \lambda w_t-\eta \frac{dl(w,b)}{dw} wt+1=wt−ηλwt−ηdwdl(w,b) - 通常 η λ < 1 \eta \lambda<1 ηλ<1,在深度学习中通常叫做权重衰退。
3 代码实现
3.1 高维线性回归
3.1.1 Lambda 函数
Lambda 函数,也叫匿名函数,是 Python 中用于创建小型、临时函数的一种方法。Lambda 函数没有名字,是一次性使用的,通常用于简化代码。它们可以作为参数传递给其他函数,或在需要一个短小的函数时使用。
lambda arguments: expression
- lambda 关键字用于定义匿名函数。
- arguments 是输入参数,可以有多个,参数之间用逗号分隔。
- expression 是一个单一的表达式,计算并返回结果。
# 定义一个 lambda 函数,计算两个数的和
sum = lambda x, y: x + y
print(sum(3, 5)) # 输出 8
- 产生线性数据
#true_w.shape=[200,1]
def synthetic_data(true_w, true_b, n_train):"""Generate y = Xw + b + noise."""X = torch.normal(0, 1, size=(n_train, len(true_w)))#X.shape=[20, 200]y = torch.matmul(X, true_w) + true_by += torch.normal(0, 0.01, y.shape)return X, d2l.reshape(y, (-1, 1))
3.1.2 torch.utils.data.TensorDataset
torch.utils.data.TensorDataset 是 PyTorch 中的数据工具类,用于将多个张量包装成一个数据集对象。这个数据集对象可以与 DataLoader 一起使用,以方便地进行批量数据加载和迭代。
每个样本将通过沿第一个维度索引张量来检索。第一个维度索引是指在张量的第 0 维度上进行索引操作。
import torch
from torch.utils.data import TensorDataset, DataLoader
# 示例数据
x = torch.tensor([[1, 2], [3, 4], [5, 6]])
y = torch.tensor([1, 2, 3])# 创建 TensorDataset
dataset = TensorDataset(x, y)
# 创建 DataLoader
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
# 迭代数据
for batch in dataloader:features, labels = batchprint(features, labels)
- 加载数据
def load_array(data_arrays, batch_size, is_train=True):"""Construct a PyTorch data iterator."""dataset = torch.utils.data.TensorDataset(*data_arrays)return torch.utils.data.DataLoader(dataset, batch_size, shuffle=is_train)
def synthetic_data(true_w, true_b, n_train):"""Generate y = Xw + b + noise."""X = torch.normal(0, 1, size=(n_train, len(true_w)))#X.shape=[20, 200]y = torch.matmul(X, true_w) + true_by += torch.normal(0, 0.01, y.shape)return X, d2l.reshape(y, (-1, 1))def load_array(data_arrays, batch_size, is_train=True):"""Construct a PyTorch data iterator."""dataset = torch.utils.data.TensorDataset(*data_arrays)return torch.utils.data.DataLoader(dataset, batch_size, shuffle=is_train)n_train, n_test, num_inputs, batch_size = 20, 100, 200, 5
true_w, true_b = torch.ones((num_inputs, 1)) * 0.01, 0.05
#true_w.shape=[200,1]
train_data = synthetic_data(true_w, true_b, n_train)
train_iter = load_array(train_data, batch_size)
test_data = synthetic_data(true_w,true_b, n_test)
test_iter = load_array(test_data, batch_size)
X,y = next(iter(test_iter))
print(X.shape, y.shape)
3.2 初始化参数模型
def init_params():w = torch.normal(0, 1, size=(num_inputs, 1), requires_grad=True)b = torch.zeros(1, requires_grad=True)return [w, b]
3.3 定义 l 2 l_2 l2范数惩罚
实现这一惩罚最方便的方法是对所有项求平方后并将它们求和
# 定义L2范数惩罚
def l2_penalty(w):return torch.sum(w.pow(2))/2
a = torch.tensor([1, 2, 3])
print(l2_penalty(a)) #输出为tensor(7.)
3.4 训练
def train(lambd):w, b = init_params()net, loss = lambda X:linreg(X, w, b), d2l.squared_lossnum_epochs, lr = 100, 0.003# 训练集损失列表train_loss_all = []# 验证集损失列表val_loss_all = []for epoch in range(num_epochs):for X,y in train_iter:l = loss(net(X), y) + lambd * l2_penalty(w)l.sum().backward()d2l.sgd([w, b], lr, batch_size)train_loss = d2l.evaluate_loss(net, train_iter, loss)val_loss = d2l.evaluate_loss(net, test_iter, loss)train_loss_all.append(train_loss)val_loss_all.append(val_loss)train_process = pd.DataFrame(data={"epoch": range(num_epochs),"train_loss_all": train_loss_all,"val_loss_all": val_loss_all,})return train_process
3.5 完整代码
import pandas as pd
import torch
import matplotlib.pyplot as plt
from torch import nn
from d2l import torch as d2ldef synthetic_data(true_w, true_b, n_train):"""Generate y = Xw + b + noise."""X = torch.normal(0, 1, size=(n_train, len(true_w)))#X.shape=[20, 200]y = torch.matmul(X, true_w) + true_by += torch.normal(0, 0.01, y.shape)return X, d2l.reshape(y, (-1, 1))def load_array(data_arrays, batch_size, is_train=True):"""Construct a PyTorch data iterator."""dataset = torch.utils.data.TensorDataset(*data_arrays)return torch.utils.data.DataLoader(dataset, batch_size, shuffle=is_train)def matplot_acc_loss(train_process):# 显示每一次迭代后的训练集和验证集的损失函数和准确率plt.figure(figsize=(12, 4))plt.subplot(1, 1, 1)plt.plot(train_process['epoch'], train_process.train_loss_all, "ro-", label="Train loss")plt.plot(train_process['epoch'], train_process.val_loss_all, "bs-", label="Val loss")plt.legend()plt.xlabel("epoch")plt.ylabel("Loss")plt.show()def linreg(X, w, b):"""The linear regression model.Defined in :numref:`sec_utils`"""return torch.matmul(X, w)+bn_train, n_test, num_inputs, batch_size = 20, 100, 200, 5
true_w, true_b = torch.ones((num_inputs, 1)) * 0.01, 0.05
#true_w.shape=[200,1]
train_data = synthetic_data(true_w, true_b, n_train)
train_iter = load_array(train_data, batch_size)
test_data = synthetic_data(true_w,true_b, n_test)
test_iter = load_array(test_data, batch_size)
X,y = next(iter(test_iter))
print(X.shape, y.shape)
# 初始化参数模型
def init_params():w = torch.normal(0, 1, size=(num_inputs, 1), requires_grad=True)b = torch.zeros(1, requires_grad=True)return [w, b]
# 定义L2范数惩罚
def l2_penalty(w):return torch.sum(w.pow(2))/2
# a = torch.tensor([1, 2, 3])
# print(l2_penalty(a))
# 训练
def train(lambd):w, b = init_params()net, loss = lambda X:linreg(X, w, b), d2l.squared_lossnum_epochs, lr = 100, 0.003# 训练集损失列表train_loss_all = []# 验证集损失列表val_loss_all = []for epoch in range(num_epochs):for X,y in train_iter:l = loss(net(X), y) + lambd * l2_penalty(w)l.sum().backward()d2l.sgd([w, b], lr, batch_size)train_loss = d2l.evaluate_loss(net, train_iter, loss)val_loss = d2l.evaluate_loss(net, test_iter, loss)train_loss_all.append(train_loss)val_loss_all.append(val_loss)train_process = pd.DataFrame(data={"epoch": range(num_epochs),"train_loss_all": train_loss_all,"val_loss_all": val_loss_all,})return train_processtrain_process = train(3)
matplot_acc_loss(train_process)
3.6 简洁实现
3.6.1 torch.optim.SGD
torch.optim.SGD 是 PyTorch 提供的标准随机梯度下降优化器(Stochastic Gradient Descent)。它适用于大多数深度学习模型的优化过程,可以通过添加动量、权重衰减(L2正则化)和 Nesterov 加速梯度来增强其性能。
torch.optim.SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)
- 参数说明
- params:待优化的模型参数,一般通过 model.parameters() 传递。
- lr (float):学习率,是控制参数更新步伐的关键超参数。
- momentum (float, 可选):动量因子,用于加速收敛并减少振荡(默认:0)。
- dampening (float, 可选):动量抑制因子,用于控制动量的累积(默认:0)。
- weight_decay (float, 可选):权重衰减(L2正则化)因子,用于防止过拟合(默认:0)。
- nesterov (bool, 可选):是否使用 Nesterov 加速梯度(默认:False)。
3.6.2 torch.nn.MSELoss
nn.MSELoss 是 PyTorch 中的一个损失函数类,用于计算均方误差(Mean Squared Error, MSE)。该损失函数通常用于回归任务中,评估预测值与实际值之间的差异。通过设置 reduction 参数,可以控制损失值的计算方式。
torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')
- 参数说明
- reduction (string, 可选):指定损失值的聚合方式。可以取以下值:
- ‘none’:不进行任何聚合,返回每个样本的损失值。
- ‘mean’:返回所有样本损失值的平均值(默认)。
- ‘sum’:返回所有样本损失值的总和。
import torch
import torch.nn as nn# 定义均方误差损失函数,不进行聚合
loss_fn = nn.MSELoss(reduction='none')# 假设有一些预测值和目标值
predictions = torch.tensor([[2.5, 0.0], [1.5, -0.5], [3.0, 2.0]], requires_grad=True)
targets = torch.tensor([[3.0, -0.5], [1.0, 0.0], [2.0, 2.0]])# 计算损失值
loss = loss_fn(predictions, targets)print(loss) # 打印每个样本的损失值# 如果需要,可以手动计算损失值的平均值或总和
mean_loss = loss.mean()
sum_loss = loss.sum()print(mean_loss) # 打印损失值的平均值
print(sum_loss) # 打印损失值的总和
def train_concise(wd):net = nn.Sequential(nn.Linear(num_inputs, 1))for param in net.parameters():param.data.normal_()loss = nn.MSELoss(reduction='none')num_epochs, lr = 100, 0.003# 将 weight_decay 应用于 net[0].weight,而不应用于 net[0].bias。trainer = torch.optim.SGD([{"params": net[0].weight, "weight_decay":wd},{"params": net[0].bias}], lr=lr)train_loss_all = []val_loss_all = []for epoch in range(num_epochs):for X,y in train_iter:trainer.zero_grad()l = loss(net(X), y)l.mean().backward()trainer.step()train_loss = d2l.evaluate_loss(net, train_iter, loss)val_loss = d2l.evaluate_loss(net, test_iter, loss)train_loss_all.append(train_loss)val_loss_all.append(val_loss)train_process = pd.DataFrame(data={'epoch':range(num_epochs),"train_loss_all":train_loss_all,"val_loss_all":val_loss_all})return train_process
4. 丢弃法
一个好的模型需要对输入数据的扰动鲁棒
- 使用有噪音的数据等价于Tikhonov正则
- 丢弃法:在层之间加入噪声
无偏差的加入噪声 - 对x加入噪声得到x’,我们希望
E [ x ′ ] = x E[x']=x E[x′]=x - 丢弃法对每个元素进行如下扰动:
x i ′ = { 0 w i t h p r o b a b l i t y p x i 1 − p o t h e r i s e x'_i=\left\{ \begin{array}{c} 0 \ with\ probablity\ p \\ \frac{x_i}{1-p} \ otherise \\ \end{array} \right. xi′={0 with probablity p1−pxi otherise
通常将丢弃法作用在隐藏全连接层的输出上
- 丢弃法将一些输出项随机置0来控制模型复杂度
- 常作用在多层感知机的隐藏层输出上
- 丢弃概率是控制模型复杂度的超参数
4.1 dropout实现
4.1.1 torch.Tensor.uniform_()
torch.Tensor.uniform_() 是一个用于填充张量的内置方法。它将张量的元素用来自均匀分布的随机数替换。通常用于初始化模型参数。
uniform_(from=0, to=1) → Tensor
参数
- from:均匀分布的下界(默认值为0)。
- to:均匀分布的上界(默认值为1)。
import torch# 创建一个形状为(3, 3)的张量
tensor = torch.empty(3, 3)# 使用uniform_方法将张量的元素初始化为均匀分布[0, 1)范围内的随机数
tensor.uniform_(0, 1)
print(tensor)# 使用uniform_方法将张量的元素初始化为均匀分布[-1, 1)范围内的随机数
tensor.uniform_(-1, 1)
print(tensor)
def dropout_layer(X, dropout):assert 0 <= dropout <= 1if dropout == 0:return torch.zeros_like(X)if dropout == 1:return Xmask = (torch.Tensor(X.shape).uniform_(0, 1) > dropout).float()return mask * X / (1.0 - dropout)
x = torch.arange(16, dtype=torch.float32).reshape((2,8))
print(x)
print(dropout_layer(x, 0))
print(dropout_layer(x, 0.5))
print(dropout_layer(x, 1.0))
4.2 定义模型
4.2.1 nn.Dropout
nn.Dropout 是 PyTorch 中的一种正则化技术,用于防止神经网络中的过拟合。它通过在训练过程中随机将一部分神经元的输出设为 0 来实现这一点。这种做法可以迫使网络的其余部分学习更稳健的特征,因为它不能依赖于某些特定的神经元。
import torch.nn as nn
dropout = nn.Dropout(p=0.5)
- p:指定每个神经元在训练时被丢弃的概率。取值范围是 [0, 1),通常设为 0.5。
训练和评估模式
Dropout 层在训练和评估(推理)模式下表现不同:
训练模式:在训练模式下,Dropout 会按照指定的概率 p 随机丢弃神经元的输出。
评估模式:在评估模式下,Dropout 不会丢弃任何神经元的输出。即使在训练过程中应Dropout,在评估时网络的所有神经元都会参与计算。
可以通过 model.train() 和 model.eval() 来切换这两种模式。
net = nn.Sequential(nn.Flatten(),nn.Linear(784, 256),nn.ReLU(),# 在第一个全连接层之后添加一个dropout层nn.Dropout(0.5),nn.Linear(256, 256),nn.ReLU(),# 在第二个全连接层之后添加一个dropout层nn.Dropout(0.2),nn.Linear(256, 10))
num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 28*28, 10, 256, 256
dropout1, dropout2 = 0.2, 0.5
class Net(nn.Module):def __init__(self, num_inputs, num_outputs, num_hiddens1, num_hiddens2, is_training = True):super(Net, self).__init__()self.training = is_trainingself.h1 = nn.Linear(num_inputs, num_hiddens1)self.h2 = nn.Linear(num_hiddens1, num_hiddens2)self.h3 = nn.Linear(num_hiddens2, num_outputs)self.Relu = nn.ReLU()def forward(self, x):x = x.view(-1, num_inputs) # 将输入形状调整为 [batch_size, num_inputs]H1 = self.Relu(self.h1(x))if self.training == True:H1 = dropout_layer(H1, dropout1)H2 = self.Relu(self.h2(H1))if self.training == True:H2 = dropout_layer(H2, dropout2)out = self.h3(H2)return outnet = Net(num_inputs, num_outputs, num_hiddens1, num_hiddens2)
x = torch.rand(size=(1,28*28),dtype=torch.float32)
def print_layer_outputs(net, x):for name, layer in net.named_children():x = layer(x)print(f"{name} output shape: {x.shape}")print_layer_outputs(net, x)
4.3 训练和测试
def train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer):argmax = lambda x, *args, **kwargs: x.argmax(*args, **kwargs)astype = lambda x, *args, **kwargs: x.type(*args, **kwargs) #转换数据类型reduce_sum = lambda x, *args, **kwargs: x.sum(*args, **kwargs) #求和# 对n个变量求和class Accumulator:"""For accumulating sums over `n` variables."""def __init__(self, n):"""Defined in :numref:`sec_utils`"""self.data = [0.0] * ndef add(self, *args):self.data = [a + float(b) for a, b in zip(self.data, args)]def reset(self):self.data = [0.0] * len(self.data)def __getitem__(self, idx):return self.data[idx]# 计算正确预测的数量def accuracy(y_hat, y):"""Compute the number of correct predictions.Defined in :numref:`sec_utils`"""if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:y_hat = argmax(y_hat, axis=1)cmp = astype(y_hat, y.dtype) == yreturn float(reduce_sum(astype(cmp, y.dtype)))# 单轮训练def train_epoch(net, train_iter, loss, trainer):if isinstance(net, nn.Module):net.train()metric_train = Accumulator(3)for X, y in train_iter:y_hat = net(X)l = loss(y_hat, y)if isinstance(trainer, torch.optim.Optimizer):trainer.zero_grad()l.mean().backward()trainer.step()else:l.sum().backward()trainer(X.shape[0])metric_train.add(float(l.sum()), accuracy(y_hat, y), y.numel())#返回训练损失和训练精度return metric_train[0]/metric_train[2], metric_train[1]/metric_train[2]# 用于计算验证集上的准确率def evalution_loss_accuracy(net, loss, data_iter):if isinstance(net, torch.nn.Module):net.eval()meteric = Accumulator(3)with torch.no_grad():for X, y in data_iter:l = loss(net(X), y)meteric.add(float(l.sum()), accuracy(net(X), y), y.numel())return meteric[0]/meteric[2], meteric[1]/meteric[2],# 训练集损失列表train_loss_all = []# 验证集损失列表val_loss_all = []# 训练集准确度列表train_acc_all = []# 验证集准确度列表val_acc_all = []for epoch in range(num_epochs):print("Epoch {}/{}".format(epoch, num_epochs - 1))print("-" * 10)train_metrics = train_epoch(net, train_iter, loss, trainer)print("{} train loss:{:.4f} train acc: {:.4f}".format(epoch, train_metrics[0], train_metrics[1]))# print(train_metrics)test_metrics = evalution_loss_accuracy(net, loss, test_iter)print("{} train loss:{:.4f} train acc: {:.4f}".format(epoch, test_metrics[0], test_metrics[1]))train_loss_all.append(train_metrics[0])train_acc_all.append(train_metrics[1])val_loss_all.append(test_metrics[0])val_acc_all.append(test_metrics[1])train_process = pd.DataFrame(data={"epoch": range(num_epochs),"train_loss_all": train_loss_all,"val_loss_all": val_loss_all,"train_acc_all": train_acc_all,"val_acc_all": val_acc_all})return train_processdef matplot_acc_loss(train_process):# 显示每一次迭代后的训练集和验证集的损失函数和准确率plt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)plt.plot(train_process['epoch'], train_process.train_loss_all, "ro-", label="Train loss")plt.plot(train_process['epoch'], train_process.val_loss_all, "bs-", label="Val loss")plt.legend()plt.xlabel("epoch")plt.ylabel("Loss")plt.subplot(1, 2, 2)plt.plot(train_process['epoch'], train_process.train_acc_all, "ro-", label="Train acc")plt.plot(train_process['epoch'], train_process.val_acc_all, "bs-", label="Val acc")plt.xlabel("epoch")plt.ylabel("acc")plt.legend()plt.show()
4.4 完整代码
import pandas as pd
import torch
import matplotlib.pyplot as plt
from torch import nn
from d2l import torch as d2ldef dropout_layer(X, dropout):assert 0 <= dropout <= 1if dropout == 0:return torch.zeros_like(X)if dropout == 1:return Xmask = (torch.Tensor(X.shape).uniform_(0, 1) > dropout).float()return mask * X / (1.0 - dropout)
x = torch.arange(16, dtype=torch.float32).reshape((2,8))
# print(x)
# print(dropout_layer(x, 0))
# print(dropout_layer(x, 0.5))
# print(dropout_layer(x, 1.0))num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 28*28, 10, 256, 256
dropout1, dropout2 = 0.2, 0.5
class Net(nn.Module):def __init__(self, num_inputs, num_outputs, num_hiddens1, num_hiddens2, is_training = True):super(Net, self).__init__()self.training = is_trainingself.h1 = nn.Linear(num_inputs, num_hiddens1)self.h2 = nn.Linear(num_hiddens1, num_hiddens2)self.h3 = nn.Linear(num_hiddens2, num_outputs)self.Relu = nn.ReLU()def forward(self, x):x = x.view(-1, num_inputs) # 将输入形状调整为 [batch_size, num_inputs]H1 = self.Relu(self.h1(x))if self.training == True:H1 = dropout_layer(H1, dropout1)H2 = self.Relu(self.h2(H1))if self.training == True:H2 = dropout_layer(H2, dropout2)out = self.h3(H2)return out
# x = torch.rand(size=(1,28*28),dtype=torch.float32)
# def print_layer_outputs(net, x):
# for name, layer in net.named_children():
# x = layer(x)
# print(f"{name} output shape: {x.shape}")
#
# print_layer_outputs(net, x)
def train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer):argmax = lambda x, *args, **kwargs: x.argmax(*args, **kwargs)astype = lambda x, *args, **kwargs: x.type(*args, **kwargs) #转换数据类型reduce_sum = lambda x, *args, **kwargs: x.sum(*args, **kwargs) #求和# 对n个变量求和class Accumulator:"""For accumulating sums over `n` variables."""def __init__(self, n):"""Defined in :numref:`sec_utils`"""self.data = [0.0] * ndef add(self, *args):self.data = [a + float(b) for a, b in zip(self.data, args)]def reset(self):self.data = [0.0] * len(self.data)def __getitem__(self, idx):return self.data[idx]# 计算正确预测的数量def accuracy(y_hat, y):"""Compute the number of correct predictions.Defined in :numref:`sec_utils`"""if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:y_hat = argmax(y_hat, axis=1)cmp = astype(y_hat, y.dtype) == yreturn float(reduce_sum(astype(cmp, y.dtype)))# 单轮训练def train_epoch(net, train_iter, loss, trainer):if isinstance(net, nn.Module):net.train()metric_train = Accumulator(3)for X, y in train_iter:y_hat = net(X)l = loss(y_hat, y)if isinstance(trainer, torch.optim.Optimizer):trainer.zero_grad()l.mean().backward()trainer.step()else:l.sum().backward()trainer(X.shape[0])metric_train.add(float(l.sum()), accuracy(y_hat, y), y.numel())#返回训练损失和训练精度return metric_train[0]/metric_train[2], metric_train[1]/metric_train[2]# 用于计算验证集上的准确率def evalution_loss_accuracy(net, loss, data_iter):if isinstance(net, torch.nn.Module):net.eval()meteric = Accumulator(3)with torch.no_grad():for X, y in data_iter:l = loss(net(X), y)meteric.add(float(l.sum()), accuracy(net(X), y), y.numel())return meteric[0]/meteric[2], meteric[1]/meteric[2]# 训练集损失列表train_loss_all = []# 验证集损失列表val_loss_all = []# 训练集准确度列表train_acc_all = []# 验证集准确度列表val_acc_all = []for epoch in range(num_epochs):print("Epoch {}/{}".format(epoch, num_epochs - 1))print("-" * 10)train_metrics = train_epoch(net, train_iter, loss, trainer)print("{} train loss:{:.4f} train acc: {:.4f}".format(epoch, train_metrics[0], train_metrics[1]))# print(train_metrics)test_metrics = evalution_loss_accuracy(net, loss, test_iter)print("{} train loss:{:.4f} train acc: {:.4f}".format(epoch, test_metrics[0], test_metrics[1]))train_loss_all.append(train_metrics[0])train_acc_all.append(train_metrics[1])val_loss_all.append(test_metrics[0])val_acc_all.append(test_metrics[1])train_process = pd.DataFrame(data={"epoch": range(num_epochs),"train_loss_all": train_loss_all,"val_loss_all": val_loss_all,"train_acc_all": train_acc_all,"val_acc_all": val_acc_all})return train_processdef matplot_acc_loss(train_process):# 显示每一次迭代后的训练集和验证集的损失函数和准确率plt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)plt.plot(train_process['epoch'], train_process.train_loss_all, "ro-", label="Train loss")plt.plot(train_process['epoch'], train_process.val_loss_all, "bs-", label="Val loss")plt.legend()plt.xlabel("epoch")plt.ylabel("Loss")plt.subplot(1, 2, 2)plt.plot(train_process['epoch'], train_process.train_acc_all, "ro-", label="Train acc")plt.plot(train_process['epoch'], train_process.val_acc_all, "bs-", label="Val acc")plt.xlabel("epoch")plt.ylabel("acc")plt.legend()plt.show()if __name__ == '__main__':net = Net(num_inputs, num_outputs, num_hiddens1, num_hiddens2)num_eporch, lr, batch_size = 10, 0.5, 256train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)loss = nn.CrossEntropyLoss(reduction='none')trainer = torch.optim.SGD(net.parameters(), lr)train_process = train_ch3(net, train_iter, test_iter, loss, num_eporch, trainer)matplot_acc_loss(train_process)