文章目录
- 1. 获取数据
- 2. 创建Dataset和DataLoader
- 3. 定义模型
- 4. 创建训练模型引擎函数
- 5. 创建保存模型的函数
- 6. 训练、评估并保存模型
模块化涉及将jupyter notebook代码转换为一系列提供类似功能的不同 Python 脚本。
可以将笔记本代码从一系列单元格转换为以下 Python 文件:
data_setup.py
- 如果需要,用于准备和下载数据的文件。engine.py
- 包含各种训练函数的文件。model_builder.py
或model.py
- 用于创建 PyTorch 模型的文件。train.py
- 用于利用所有其他文件并训练目标 PyTorch 模型的文件。utils.py
- 专用于有用实用功能的文件。
上述文件的命名和布局将取决于您的用例和代码要求。 Python 脚本与单个notebook单元一样通用,这意味着您可以为几乎任何类型的功能创建脚本。
notebook非常适合快速迭代探索和运行实验,但是,对于较大规模的项目,您可能会发现 Python 脚本更具可重复性且更易于运行。
在你去download别人开源的项目时,可能会指示您在终端/命令行中运行如下代码来训练模型:
python train.py --model MODEL_NAME --batch_size BATCH_SIZE --lr LEARNING_RATE --num_epochs NUM_EPOCHS
train.py 是目标 Python 脚本,它可能包含训练 PyTorch 模型的函数,--model
、 --batch_size
、 --lr
和 --num_epochs
被称为参数标志。
可以将它们设置为您喜欢的任何值,如果它们与 train.py 兼容,它们就会工作,如果不兼容,它们就会出错。
例如,训练 TinyVGG 模型 10 个时期,批量大小为 32,学习率为 0.001:
python train.py --model tinyvgg --batch_size 32 --lr 0.001 --num_epochs 10
Python脚本的目录结构:
going_modular/
├── going_modular/
│ ├── data_setup.py
│ ├── engine.py
│ ├── model_builder.py
│ ├── train.py
│ └── utils.py
├── models/
│ ├── 05_going_modular_cell_mode_tinyvgg_model.pth
│ └── 05_going_modular_script_mode_tinyvgg_model.pth
└── data/└── pizza_steak_sushi/├── train/│ ├── pizza/│ │ ├── image01.jpeg│ │ └── ...│ ├── steak/│ └── sushi/└── test/├── pizza/├── steak/└── sushi/
1. 获取数据
2. 创建Dataset和DataLoader
( data_setup.py )
"""
Contains functionality for creating PyTorch DataLoaders for
image classification data.
"""
import osfrom torchvision import datasets, transforms
from torch.utils.data import DataLoaderNUM_WORKERS = os.cpu_count()def create_dataloaders(train_dir: str, test_dir: str, transform: transforms.Compose, batch_size: int, num_workers: int=NUM_WORKERS
):"""Creates training and testing DataLoaders.Takes in a training directory and testing directory path and turnsthem into PyTorch Datasets and then into PyTorch DataLoaders.Args:train_dir: Path to training directory.test_dir: Path to testing directory.transform: torchvision transforms to perform on training and testing data.batch_size: Number of samples per batch in each of the DataLoaders.num_workers: An integer for number of workers per DataLoader.Returns:A tuple of (train_dataloader, test_dataloader, class_names).Where class_names is a list of the target classes.Example usage:train_dataloader, test_dataloader, class_names = \= create_dataloaders(train_dir=path/to/train_dir,test_dir=path/to/test_dir,transform=some_transform,batch_size=32,num_workers=4)"""# Use ImageFolder to create dataset(s)train_data = datasets.ImageFolder(train_dir, transform=transform)test_data = datasets.ImageFolder(test_dir, transform=transform)# Get class namesclass_names = train_data.classes# Turn images into data loaderstrain_dataloader = DataLoader(train_data,batch_size=batch_size,shuffle=True,num_workers=num_workers,pin_memory=True,)test_dataloader = DataLoader(test_data,batch_size=batch_size,shuffle=False,num_workers=num_workers,pin_memory=True,)return train_dataloader, test_dataloader, class_names
如果我们想要创建 DataLoader ,我们现在可以在 data_setup.py 中使用该函数,如下所示:
# Import data_setup.py
from going_modular import data_setup# Create train/test dataloader and get class names as a list
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(...)
3. 定义模型
(model_builder.py)
"""
Contains PyTorch model code to instantiate a TinyVGG model.
"""
import torch
from torch import nn class TinyVGG(nn.Module):"""Creates the TinyVGG architecture.Replicates the TinyVGG architecture from the CNN explainer website in PyTorch.See the original architecture here: https://poloclub.github.io/cnn-explainer/Args:input_shape: An integer indicating number of input channels.hidden_units: An integer indicating number of hidden units between layers.output_shape: An integer indicating number of output units."""def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:super().__init__()self.conv_block_1 = nn.Sequential(nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=0), nn.ReLU(),nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units,kernel_size=3,stride=1,padding=0),nn.ReLU(),nn.MaxPool2d(kernel_size=2,stride=2))self.conv_block_2 = nn.Sequential(nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),nn.ReLU(),nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),nn.ReLU(),nn.MaxPool2d(2))self.classifier = nn.Sequential(nn.Flatten(),# Where did this in_features shape come from? # It's because each layer of our network compresses and changes the shape of our inputs data.nn.Linear(in_features=hidden_units*13*13,out_features=output_shape))def forward(self, x: torch.Tensor):x = self.conv_block_1(x)x = self.conv_block_2(x)x = self.classifier(x)return x# return self.classifier(self.block_2(self.block_1(x))) # <- leverage the benefits of operator fusion
4. 创建训练模型引擎函数
- train_step() - 接受模型、 DataLoader 、损失函数和优化器,并在 DataLoader 上训练模型。
- test_step() - 接受模型、 DataLoader 和损失函数,并在 DataLoader 上评估模型。
- train() - 对给定数量的 epoch 一起执行 1. 和 2. 并返回结果字典。
由于这些将成为我们模型训练的引擎,因此我们可以将它们全部放入名为 engine.py
的 Python 脚本中:
"""
Contains functions for training and testing a PyTorch model.
"""
import torchfrom tqdm.auto import tqdm
from typing import Dict, List, Tupledef train_step(model: torch.nn.Module, dataloader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module, optimizer: torch.optim.Optimizer,device: torch.device) -> Tuple[float, float]:"""Trains a PyTorch model for a single epoch.Turns a target PyTorch model to training mode and thenruns through all of the required training steps (forwardpass, loss calculation, optimizer step).Args:model: A PyTorch model to be trained.dataloader: A DataLoader instance for the model to be trained on.loss_fn: A PyTorch loss function to minimize.optimizer: A PyTorch optimizer to help minimize the loss function.device: A target device to compute on (e.g. "cuda" or "cpu").Returns:A tuple of training loss and training accuracy metrics.In the form (train_loss, train_accuracy). For example:(0.1112, 0.8743)"""# Put model in train modemodel.train()# Setup train loss and train accuracy valuestrain_loss, train_acc = 0, 0# Loop through data loader data batchesfor batch, (X, y) in enumerate(dataloader):# Send data to target deviceX, y = X.to(device), y.to(device)# 1. Forward passy_pred = model(X)# 2. Calculate and accumulate lossloss = loss_fn(y_pred, y)train_loss += loss.item() # 3. Optimizer zero gradoptimizer.zero_grad()# 4. Loss backwardloss.backward()# 5. Optimizer stepoptimizer.step()# Calculate and accumulate accuracy metric across all batchesy_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)train_acc += (y_pred_class == y).sum().item()/len(y_pred)# Adjust metrics to get average loss and accuracy per batch train_loss = train_loss / len(dataloader)train_acc = train_acc / len(dataloader)return train_loss, train_accdef test_step(model: torch.nn.Module, dataloader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module,device: torch.device) -> Tuple[float, float]:"""Tests a PyTorch model for a single epoch.Turns a target PyTorch model to "eval" mode and then performsa forward pass on a testing dataset.Args:model: A PyTorch model to be tested.dataloader: A DataLoader instance for the model to be tested on.loss_fn: A PyTorch loss function to calculate loss on the test data.device: A target device to compute on (e.g. "cuda" or "cpu").Returns:A tuple of testing loss and testing accuracy metrics.In the form (test_loss, test_accuracy). For example:(0.0223, 0.8985)"""# Put model in eval modemodel.eval() # Setup test loss and test accuracy valuestest_loss, test_acc = 0, 0# Turn on inference context managerwith torch.inference_mode():# Loop through DataLoader batchesfor batch, (X, y) in enumerate(dataloader):# Send data to target deviceX, y = X.to(device), y.to(device)# 1. Forward passtest_pred_logits = model(X)# 2. Calculate and accumulate lossloss = loss_fn(test_pred_logits, y)test_loss += loss.item()# Calculate and accumulate accuracytest_pred_labels = test_pred_logits.argmax(dim=1)test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))# Adjust metrics to get average loss and accuracy per batch test_loss = test_loss / len(dataloader)test_acc = test_acc / len(dataloader)return test_loss, test_accdef train(model: torch.nn.Module, train_dataloader: torch.utils.data.DataLoader, test_dataloader: torch.utils.data.DataLoader, optimizer: torch.optim.Optimizer,loss_fn: torch.nn.Module,epochs: int,device: torch.device) -> Dict[str, List]:"""Trains and tests a PyTorch model.Passes a target PyTorch models through train_step() and test_step()functions for a number of epochs, training and testing the modelin the same epoch loop.Calculates, prints and stores evaluation metrics throughout.Args:model: A PyTorch model to be trained and tested.train_dataloader: A DataLoader instance for the model to be trained on.test_dataloader: A DataLoader instance for the model to be tested on.optimizer: A PyTorch optimizer to help minimize the loss function.loss_fn: A PyTorch loss function to calculate loss on both datasets.epochs: An integer indicating how many epochs to train for.device: A target device to compute on (e.g. "cuda" or "cpu").Returns:A dictionary of training and testing loss as well as training andtesting accuracy metrics. Each metric has a value in a list for each epoch.In the form: {train_loss: [...],train_acc: [...],test_loss: [...],test_acc: [...]} For example if training for epochs=2: {train_loss: [2.0616, 1.0537],train_acc: [0.3945, 0.3945],test_loss: [1.2641, 1.5706],test_acc: [0.3400, 0.2973]} """# Create empty results dictionaryresults = {"train_loss": [],"train_acc": [],"test_loss": [],"test_acc": []}# Loop through training and testing steps for a number of epochsfor epoch in tqdm(range(epochs)):train_loss, train_acc = train_step(model=model,dataloader=train_dataloader,loss_fn=loss_fn,optimizer=optimizer,device=device)test_loss, test_acc = test_step(model=model,dataloader=test_dataloader,loss_fn=loss_fn,device=device)# Print out what's happeningprint(f"Epoch: {epoch+1} | "f"train_loss: {train_loss:.4f} | "f"train_acc: {train_acc:.4f} | "f"test_loss: {test_loss:.4f} | "f"test_acc: {test_acc:.4f}")# Update results dictionaryresults["train_loss"].append(train_loss)results["train_acc"].append(train_acc)results["test_loss"].append(test_loss)results["test_acc"].append(test_acc)# Return the filled results at the end of the epochsreturn results
现在我们已经有了 engine.py 脚本,我们可以通过以下方式从中导入函数:
# Import engine.py
from going_modular import engine# Use train() by calling it from engine.py
engine.train(...)
5. 创建保存模型的函数
( utils.py )
将 save_model() 函数保存到名为 utils.py 的文件中:
"""
Contains various utility functions for PyTorch model training and saving.
"""
import torch
from pathlib import Pathdef save_model(model: torch.nn.Module,target_dir: str,model_name: str):"""Saves a PyTorch model to a target directory.Args:model: A target PyTorch model to save.target_dir: A directory for saving the model to.model_name: A filename for the saved model. Should includeeither ".pth" or ".pt" as the file extension.Example usage:save_model(model=model_0,target_dir="models",model_name="05_going_modular_tingvgg_model.pth")"""# Create target directorytarget_dir_path = Path(target_dir)target_dir_path.mkdir(parents=True,exist_ok=True)# Create model save pathassert model_name.endswith(".pth") or model_name.endswith(".pt"), "model_name should end with '.pt' or '.pth'"model_save_path = target_dir_path / model_name# Save the model state_dict()print(f"[INFO] Saving model to: {model_save_path}")torch.save(obj=model.state_dict(),f=model_save_path)
可以导入它并通过以下方式使用它,而不是重新编写它:
# Import utils.py
from going_modular import utils# Save a model to file
save_model(model=...target_dir=...,model_name=...)
6. 训练、评估并保存模型
( train.py )
可以在命令行上使用一行代码来训练 PyTorch 模型:
python train.py
要创建 train.py ,我们将执行以下步骤:
- 导入各种依赖项,即 torch 、 os 、 torchvision.transforms 以及 going_modular 目录 data_setup 、 model_builder 、 utils 。
- 注意:由于 train.py 将位于 going_modular 目录中,因此我们可以通过 import … 而不是 from going_modular import … 导入其他模块。
- 设置各种超参数,例如批量大小、时期数、学习率和隐藏单元数(将来可以通过 Python 的 argparse 设置)。
- 设置训练和测试目录。
- 设置与设备无关的代码。
- 创建必要的数据转换。
- 使用 data_setup.py 创建 DataLoaders。
- 使用 model_builder.py 创建模型。
- 设置损失函数和优化器。
- 使用 engine.py 训练模型。
- 使用 utils.py 保存模型。
"""
Trains a PyTorch image classification model using device-agnostic code.
"""import os
import torch
import data_setup, engine, model_builder, utilsfrom torchvision import transforms# Setup hyperparameters
NUM_EPOCHS = 5
BATCH_SIZE = 32
HIDDEN_UNITS = 10
LEARNING_RATE = 0.001# Setup directories
train_dir = "data/pizza_steak_sushi/train"
test_dir = "data/pizza_steak_sushi/test"# Setup target device
device = "cuda" if torch.cuda.is_available() else "cpu"# Create transforms
data_transform = transforms.Compose([transforms.Resize((64, 64)),transforms.ToTensor()
])# Create DataLoaders with help from data_setup.py
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,test_dir=test_dir,transform=data_transform,batch_size=BATCH_SIZE
)# Create model with help from model_builder.py
model = model_builder.TinyVGG(input_shape=3,hidden_units=HIDDEN_UNITS,output_shape=len(class_names)
).to(device)# Set loss and optimizer
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=LEARNING_RATE)# Start training with help from engine.py
engine.train(model=model,train_dataloader=train_dataloader,test_dataloader=test_dataloader,loss_fn=loss_fn,optimizer=optimizer,epochs=NUM_EPOCHS,device=device)# Save the model with help from utils.py
utils.save_model(model=model,target_dir="models",model_name="05_going_modular_script_mode_tinyvgg_model.pth")
可以调整 train.py 文件以使用 Python 的 argparse
模块的参数标志输入,这将允许我们提供不同的超参数设置,如前面讨论的:
python train.py --model MODEL_NAME --batch_size BATCH_SIZE --lr LEARNING_RATE --num_epochs NUM_EPOCHS