model.eval() :让BatchNorm、Dropout等失效;
with torch.no_grad() : 不再缓存activation,节省显存;
这是矩阵乘法:
y1 = tensor @ tensor.T y2 = tensor.matmul(tensor.T)y3 = torch.rand_like(y1) torch.matmul(tensor, tensor.T, out=y3)
这是点乘:
z1 = tensor * tensor z2 = tensor.mul(tensor)z3 = torch.rand_like(tensor) torch.mul(tensor, tensor, out=z3)
Tensor如果是1*1大小的,可以转为普通Python变量
agg = tensor.sum() agg_item = agg.item()
Tensor和numpy之间,是share内存的,改一个另一个也被改动
n = torch.ones(5).numpy()n = np.ones(5) t = torch.from_numpy(n)
root本地文件夹里有,则从本地读;没有的话,如指定了ownload=True,则从远程下载;
import torch from torch.utils.data import Dataset from torchvision import datasets from torchvision.transforms import ToTensor, Lambdatraining_data = datasets.FashionMNIST(root="data",train=True,download=True,transform=ToTensor(),target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1)) )
Dataset类:通过index,拿到1条数据;
数据可以都在磁盘上,用到哪条,就加载哪条;
自定义一个类,需要继承Dataset类,并重写__init__、__len__、__getitem__
DataLoader类:batching, shuffle(sampling策略), multiprocess加载,pin memory,...
ToTensor(): 把PIL格式的Image,转成Tensor;
Lambda: 把int的y,转成10维度的1-hot向量;
一切模型层,皆继承自torch.nn.Module
class NeuralNetwork(nn.Module):
Module必须copy到device上
model = NeuralNetwork().to(device)
input data也必须copy到device上
X = torch.rand(1, 28, 28, device=device)
不能直接使用Module.forward,使用Module(input)语法可以使前后的hook起作用
logits = model(X)
model.parameters(): 可训练的参数;
model.named_parameters(): 可训练的参数;包含名称;
state_dict: 可训练的参数、不可训练的参数,都有;
继承自Function类,可以写自定义的forward和backward,input或output可以放在ctx里:
>>> class Exp(Function): >>> @staticmethod >>> def forward(ctx, i): >>> result = i.exp() >>> ctx.save_for_backward(result) >>> return result >>> >>> @staticmethod >>> def backward(ctx, grad_output): >>> result, = ctx.saved_tensors >>> return grad_output * result >>> >>> # Use it by calling the apply method: >>> output = Exp.apply(input)
构造计算图:
Tensor的几大成员:grad, grad_fn, is_leaf, requires_grad
Tensor.grad_fn,就是用于backward梯度计算的Function:
print(f"Gradient function for z = {z.grad_fn}") print(f"Gradient function for loss = {loss.grad_fn}")# Output: Gradient function for z = <AddBackward0 object at 0x7f5e9fb64e20> Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f5e99b11b40>
backward时,注意,是累积加和到Tensor.grad上;这样,链式法则有些地方就是要加和的,accumulate step也可以实现;
只有满足这个条件的才会累积其grad: is_leaf==True && requires_grad==True
只有requires_grad==True,但is_leaf==False,则会将梯度传播给上游,自己的grad成员无值;
只用来inference时,可用"with torch.no_grad()"控制其不生成计算图:(好处:forward速度变快一点儿,不保存activation至ctx节省显存)
with torch.no_grad():z = torch.matmul(x, w)+b print(z.requires_grad)Output: False
某些模型训练,有些parameter要设成frozen不参与权重更新,则手工设其requires_grad=False即可。
用detach()来创造数据引用,脱离了原计算图,原计算图可以被垃圾回收了:
z = torch.matmul(x, w)+b z_det = z.detach() print(z_det.requires_grad)Output: False
backward DAG,在每次forward阶段,都会被重新搭建;所以每个step,计算图可以任意变化(例如根据Tensor的值来走不同的control flow)
向量对向量求偏导,得到的是雅克比矩阵:
以下例子演示:雅克比矩阵、梯度累积、zero_grad
inp = torch.eye(4, 5, requires_grad=True) out = (inp+1).pow(2).t() out.backward(torch.ones_like(out), retain_graph=True) print(f"First call\n{inp.grad}") out.backward(torch.ones_like(out), retain_graph=True) print(f"\nSecond call\n{inp.grad}") inp.grad.zero_() out.backward(torch.ones_like(out), retain_graph=True) print(f"\nCall after zeroing gradients\n{inp.grad}")Output:First call tensor([[4., 2., 2., 2., 2.],[2., 4., 2., 2., 2.],[2., 2., 4., 2., 2.],[2., 2., 2., 4., 2.]])Second call tensor([[8., 4., 4., 4., 4.],[4., 8., 4., 4., 4.],[4., 4., 8., 4., 4.],[4., 4., 4., 8., 4.]])Call after zeroing gradients tensor([[4., 2., 2., 2., 2.],[2., 4., 2., 2., 2.],[2., 2., 4., 2., 2.],[2., 2., 2., 4., 2.]])
optimizer使用例子
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)def train(...):model.train()for batch, (X, y) in enumerate(dataloader):# Compute prediction and losspred = model(X)loss = loss_fn(pred, y)# Backpropagationloss.backward()optimizer.step()optimizer.zero_grad() # 将所有Tensor.grad清0
torch.save: 使用Python的pickle,将一个dict进行序列化,并存至文件;
torch.load: 读取文件,使用Python的pickle,将字节数组进行反序列化,至一个dict;
torch.nn.Module.state_dict: 一个Python的dict,key是字符串,value是Tensor;包含可学习的parameters,不可学习的buffers(例如batch normalization需要的running mean);
optimizer也有state_dic(learning rate,冲量等)
save下来仅仅用于推理:(注意:必须model.eval(),否则dropout、BN,会出毛病)
# save: torch.save(model.state_dict(), PATH)# load: model = TheModelClass(*args, **kwargs) model.load_state_dict(torch.load(PATH)) model.eval()
save下来可用于继续训练:
# save: torch.save({'epoch': epoch,'model_state_dict': model.state_dict(),'optimizer_state_dict': optimizer.state_dict(),'loss': loss,...}, PATH)# load: model = TheModelClass(*args, **kwargs) optimizer = TheOptimizerClass(*args, **kwargs)checkpoint = torch.load(PATH) model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) epoch = checkpoint['epoch'] loss = checkpoint['loss']model.train()
使用state_dict方式,load之前,model必须初始化好(内存已经被parameters占住了,只是权重是随机的)
map_location、model.to(device)等:Saving and Loading Models — PyTorch Tutorials 2.3.0+cu121 documentation
小众用法:(model不用初始化)
# save: torch.save(model, PATH)# load: # Model class must be defined somewhere model = torch.load(PATH) model.eval()