使用 PYTORCH 进行图像风格迁移

一、介绍

本教程介绍如何实现由 Leon A. Gatys、Alexander S. Ecker 和 Matthias Bethge 开发的神经风格算法。神经风格或神经传输允许您拍摄图像并以新的艺术风格再现它。该算法采用三幅图像，即输入图像、内容图像和风格图像，并将输入更改为类似于内容图像的内容和风格图像的艺术风格。

二、基本原则

原理很简单：我们定义两个距离，一个距离为内容 ( $D_C$ ) 和一个用于样式 ( $D_S$ ）。 $D_C$ 测量两个图像之间内容的差异程度 $D_S$ 衡量两个图像之间的风格差异程度。然后，我们获取第三个图像（输入），并对其进行转换，以最小化其与内容图像的内容距离以及与样式图像的样式距离。现在我们可以导入必要的包并开始神经传输。

三、导入包并选择设备

以下是实现神经传输所需的软件包列表。

torch, torch.nn, numpy（PyTorch 神经网络不可或缺的软件包）
torch.optim（高效梯度下降）
PIL, PIL.Image, matplotlib.pyplot（加载并显示图像）
torchvision.transforms（将PIL图像转换为张量）
torchvision.models（训练或加载预训练模型）
copy（深度复制模型；系统包）

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optimfrom PIL import Image
import matplotlib.pyplot as pltimport torchvision.transforms as transforms
import torchvision.models as modelsimport copy

接下来，我们需要选择运行网络的设备并导入内容和样式图像。在大图像上运行神经传输算法需要更长的时间，并且在 GPU 上运行时速度会快得多。我们可以用来torch.cuda.is_available()检测是否有可用的 GPU。接下来，我们设置torch.device在整个教程中使用的。该方法还.to(device) 用于将张量或模块移动到所需的设备。

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.set_default_device(device)

四、加载图像

现在我们将导入样式和内容图像。原始 PIL 图像的值在 0 到 255 之间，但当转换为 torch 张量时，它们的值将转换为 0 到 1 之间。图像还需要调整大小以具有相同的尺寸。需要注意的一个重要细节是，torch 库中的神经网络是使用 0 到 1 范围内的张量值进行训练的。如果您尝试向网络提供 0 到 255 个张量图像，那么激活的特征图将无法感知预期的特征内容和风格。然而，Caffe 库中的预训练网络是使用 0 到 255 个张量图像进行训练的。

注意

以下是下载运行本教程所需图像的链接： picasso.jpg和 dance.jpg。images下载这两个图像并将它们添加到当前工作目录中的同名目录中。

# desired size of the output image
imsize = 512 if torch.cuda.is_available() else 128  # use small size if no GPUloader = transforms.Compose([transforms.Resize(imsize),  # scale imported imagetransforms.ToTensor()])  # transform it into a torch tensordef image_loader(image_name):image = Image.open(image_name)# fake batch dimension required to fit network's input dimensionsimage = loader(image).unsqueeze(0)return image.to(device, torch.float)style_img = image_loader("./data/images/neural-style/picasso.jpg")
content_img = image_loader("./data/images/neural-style/dancing.jpg")assert style_img.size() == content_img.size(), \"we need to import style and content images of the same size"

现在，让我们创建一个函数，通过将图像的副本重新转换为 PIL 格式并使用 plt.imshow. 我们将尝试显示内容和样式图像，以确保它们正确导入。

unloader = transforms.ToPILImage()  # reconvert into PIL imageplt.ion()def imshow(tensor, title=None):image = tensor.cpu().clone()  # we clone the tensor to not do changes on itimage = image.squeeze(0)      # remove the fake batch dimensionimage = unloader(image)plt.imshow(image)if title is not None:plt.title(title)plt.pause(0.001) # pause a bit so that plots are updatedplt.figure()
imshow(style_img, title='Style Image')plt.figure()
imshow(content_img, title='Content Image')

五、损失函数

5.1 内容丢失

内容损失是表示单个层的内容距离的加权版本的函数。该函数获取特征图 $F_{XL}$ 一层的 L在网络处理输入中 X并返回加权内容距离 $w_{CL}\cdot D^L_C(X,C)$ 。图像之间 X和内容图像 C。内容图像的特征图（ $F_{CL}$ ) 必须由函数知道才能计算内容距离。我们将此函数实现为 torch 模块，其构造函数采用 $F_{CL}$ 作为输入。距离 $\left \| F_{XL} - F_{CL} \right \| ^2$ 是两组特征图之间的均方误差，可以使用计算nn.MSELoss。

我们将直接在用于计算内容距离的卷积层之后添加此内容损失模块。这样，每次向网络输入输入图像时，都会在所需的层计算内容损失，并且由于自动梯度，所有梯度都将被计算。现在，为了使内容损失层透明，我们必须定义一个forward方法来计算内容损失，然后返回该层的输入。计算出的损失被保存为模块的参数。

class ContentLoss(nn.Module):def __init__(self, target,):super(ContentLoss, self).__init__()# we 'detach' the target content from the tree used# to dynamically compute the gradient: this is a stated value,# not a variable. Otherwise the forward method of the criterion# will throw an error.self.target = target.detach()def forward(self, input):self.loss = F.mse_loss(input, self.target)return input

笔记

重要细节：虽然这个模块被命名ContentLoss，但它不是真正的 PyTorch Loss 函数。如果要将内容损失定义为 PyTorch Loss 函数，则必须创建 PyTorch autograd 函数以在方法中手动重新计算/实现梯度backward 。

5.2 风格缺失

风格丢失模块的实现与内容丢失模块类似。它将充当网络中的透明层，计算该层的风格损失。为了计算风格损失，我们需要计算 gram 矩阵 $G_{XL}$ 。gram 矩阵是给定矩阵与其转置矩阵相乘的结果。在此应用中，给定矩阵是特征图的重塑版本 $F_{XL}$ 一层的 L。 $F_{XL}$ 被重塑形成 $\hat{F}_{XL}$ ， $K\times N$ 矩阵，其中 K是该L层的特征图的数量，另外N是任何矢量化特征图的长度 $F_{XL}^K$ 。例如， $\hat{F} _{XL}$ 的第一行对应于第一个向量化特征图 $F_{XL}^1$ 。

最后，必须通过将每个元素除以矩阵中的元素总数来对 gram 矩阵进行归一化。这种正常化是为了抵消以下事实： $\hat{F} _{XL}$ 具有大的矩阵氮氮维度在 Gram 矩阵中产生更大的值。这些较大的值将导致第一层（池化层之前）在梯度下降期间产生更大的影响。风格特征往往位于网络的较深层，因此这个标准化步骤至关重要。

def gram_matrix(input):a, b, c, d = input.size()  # a=batch size(=1)# b=number of feature maps# (c,d)=dimensions of a f. map (N=c*d)features = input.view(a * b, c * d)  # resize F_XL into \hat F_XLG = torch.mm(features, features.t())  # compute the gram product# we 'normalize' the values of the gram matrix# by dividing by the number of element in each feature maps.return G.div(a * b * c * d)

现在，风格丢失模块看起来几乎与内容丢失模块一模一样。风格距离也是使用之间的均方误差来计算的 $G_{XL}$ 和 $G_{SL}$ 。

class StyleLoss(nn.Module):def __init__(self, target_feature):super(StyleLoss, self).__init__()self.target = gram_matrix(target_feature).detach()def forward(self, input):G = gram_matrix(input)self.loss = F.mse_loss(G, self.target)return input

六、导入模型

现在我们需要导入一个预训练的神经网络。我们将使用 19 层 VGG 网络，就像论文中使用的那样。

PyTorch 的 VGG 实现是一个模块，分为两个子 Sequential模块：（features包含卷积层和池化层）和classifier（包含全连接层）。我们将使用该 features模块，因为我们需要各个卷积层的输出来测量内容和风格损失。有些层在训练期间的行为与评估期间的行为不同，因此我们必须使用将网络设置为评估模式.eval()。

cnn = models.vgg19(pretrained=True).features.eval()

/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning:The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead./opt/conda/envs/py_3.10/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning:Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG19_Weights.IMAGENET1K_V1`. You can also use `weights=VGG19_Weights.DEFAULT` to get the most up-to-date weights.Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth0%|          | 0.00/548M [00:00<?, ?B/s]2%|2         | 12.1M/548M [00:00<00:04, 126MB/s]4%|4         | 24.4M/548M [00:00<00:04, 128MB/s]7%|6         | 36.7M/548M [00:00<00:04, 129MB/s]9%|8         | 49.1M/548M [00:00<00:04, 129MB/s]11%|#1        | 61.5M/548M [00:00<00:03, 130MB/s]13%|#3        | 74.0M/548M [00:00<00:03, 130MB/s]16%|#5        | 86.5M/548M [00:00<00:03, 130MB/s]18%|#8        | 98.9M/548M [00:00<00:03, 130MB/s]20%|##        | 111M/548M [00:00<00:03, 130MB/s]23%|##2       | 124M/548M [00:01<00:03, 130MB/s]25%|##4       | 136M/548M [00:01<00:03, 130MB/s]27%|##7       | 149M/548M [00:01<00:03, 130MB/s]29%|##9       | 161M/548M [00:01<00:03, 130MB/s]32%|###1      | 173M/548M [00:01<00:03, 130MB/s]34%|###3      | 186M/548M [00:01<00:02, 130MB/s]36%|###6      | 198M/548M [00:01<00:02, 130MB/s]38%|###8      | 211M/548M [00:01<00:02, 130MB/s]41%|####      | 223M/548M [00:01<00:02, 129MB/s]43%|####2     | 235M/548M [00:01<00:02, 129MB/s]45%|####5     | 248M/548M [00:02<00:02, 130MB/s]47%|####7     | 260M/548M [00:02<00:02, 129MB/s]50%|####9     | 272M/548M [00:02<00:02, 130MB/s]52%|#####1    | 285M/548M [00:02<00:02, 130MB/s]54%|#####4    | 297M/548M [00:02<00:02, 130MB/s]57%|#####6    | 310M/548M [00:02<00:01, 130MB/s]59%|#####8    | 322M/548M [00:02<00:01, 130MB/s]61%|######1   | 335M/548M [00:02<00:01, 130MB/s]63%|######3   | 347M/548M [00:02<00:01, 130MB/s]66%|######5   | 359M/548M [00:02<00:01, 130MB/s]68%|######7   | 372M/548M [00:03<00:01, 130MB/s]70%|#######   | 384M/548M [00:03<00:01, 130MB/s]72%|#######2  | 397M/548M [00:03<00:01, 130MB/s]75%|#######4  | 409M/548M [00:03<00:01, 130MB/s]77%|#######6  | 422M/548M [00:03<00:01, 130MB/s]79%|#######9  | 434M/548M [00:03<00:00, 129MB/s]81%|########1 | 446M/548M [00:03<00:00, 130MB/s]84%|########3 | 459M/548M [00:03<00:00, 129MB/s]86%|########5 | 471M/548M [00:03<00:00, 130MB/s]88%|########8 | 484M/548M [00:03<00:00, 130MB/s]91%|######### | 496M/548M [00:04<00:00, 130MB/s]93%|#########2| 508M/548M [00:04<00:00, 130MB/s]95%|#########5| 521M/548M [00:04<00:00, 130MB/s]97%|#########7| 533M/548M [00:04<00:00, 130MB/s]
100%|#########9| 546M/548M [00:04<00:00, 130MB/s]
100%|##########| 548M/548M [00:04<00:00, 130MB/s]

此外，VGG 网络在图像上进行训练，每个通道均按平均值=[0.485,0.456,0.406]和标准差=[0.229,0.224,0.225]标准化。在将图像发送到网络之前，我们将使用它们来标准化图像。

cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406])
cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225])# create a module to normalize input image so we can easily put it in a
# ``nn.Sequential``
class Normalization(nn.Module):def __init__(self, mean, std):super(Normalization, self).__init__()# .view the mean and std to make them [C x 1 x 1] so that they can# directly work with image Tensor of shape [B x C x H x W].# B is batch size. C is number of channels. H is height and W is width.self.mean = torch.tensor(mean).view(-1, 1, 1)self.std = torch.tensor(std).view(-1, 1, 1)def forward(self, img):# normalize ``img``return (img - self.mean) / self.std

顺序模块包含子模块的有序列表。例如，vgg19.features 包含一个按正确深度顺序对齐的序列（Conv2d、ReLU、MaxPool2d、Conv2d、ReLU...）。我们需要在它们正在检测的卷积层之后立即添加内容损失和风格损失层。为此，我们必须创建一个新的 Sequential 模块，其中正确插入了内容丢失和样式丢失模块。

# desired depth layers to compute style/content losses :
content_layers_default = ['conv_4']
style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']def get_style_model_and_losses(cnn, normalization_mean, normalization_std,style_img, content_img,content_layers=content_layers_default,style_layers=style_layers_default):# normalization modulenormalization = Normalization(normalization_mean, normalization_std)# just in order to have an iterable access to or list of content/style# lossescontent_losses = []style_losses = []# assuming that ``cnn`` is a ``nn.Sequential``, so we make a new ``nn.Sequential``# to put in modules that are supposed to be activated sequentiallymodel = nn.Sequential(normalization)i = 0  # increment every time we see a convfor layer in cnn.children():if isinstance(layer, nn.Conv2d):i += 1name = 'conv_{}'.format(i)elif isinstance(layer, nn.ReLU):name = 'relu_{}'.format(i)# The in-place version doesn't play very nicely with the ``ContentLoss``# and ``StyleLoss`` we insert below. So we replace with out-of-place# ones here.layer = nn.ReLU(inplace=False)elif isinstance(layer, nn.MaxPool2d):name = 'pool_{}'.format(i)elif isinstance(layer, nn.BatchNorm2d):name = 'bn_{}'.format(i)else:raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))model.add_module(name, layer)if name in content_layers:# add content loss:target = model(content_img).detach()content_loss = ContentLoss(target)model.add_module("content_loss_{}".format(i), content_loss)content_losses.append(content_loss)if name in style_layers:# add style loss:target_feature = model(style_img).detach()style_loss = StyleLoss(target_feature)model.add_module("style_loss_{}".format(i), style_loss)style_losses.append(style_loss)# now we trim off the layers after the last content and style lossesfor i in range(len(model) - 1, -1, -1):if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):breakmodel = model[:(i + 1)]return model, style_losses, content_losses

接下来，我们选择输入图像。您可以使用内容图像的副本或白噪声。

input_img = content_img.clone()
# if you want to use white noise by using the following code:
#
# ::
#
# input_img = torch.randn(content_img.data.size())# add the original input image to the figure:
plt.figure()
imshow(input_img, title='Input Image')

输入图像

七、梯度下降

正如该算法的作者 Leon Gatys 所建议的，我们将使用 L-BFGS 算法来运行梯度下降。与训练网络不同，我们希望训练输入图像以最小化内容/风格损失。我们将创建一个 PyTorch L-BFGS 优化器optim.LBFGS，并将图像传递给它作为要优化的张量。

def get_input_optimizer(input_img):# this line to show that input is a parameter that requires a gradientoptimizer = optim.LBFGS([input_img])return optimizer

最后，我们必须定义一个执行神经传输的函数。对于网络的每次迭代，它都会收到更新的输入并计算新的损失。我们将运行backward每个损失模块的方法来动态计算它们的梯度。优化器需要一个“闭包”函数，它重新评估模块并返回损失。

我们还有最后一个限制需要解决。网络可能会尝试使用超过图像 0 到 1 张量范围的值来优化输入。我们可以通过每次网络运行时将输入值纠正为 0 到 1 之间来解决这个问题。

def run_style_transfer(cnn, normalization_mean, normalization_std,content_img, style_img, input_img, num_steps=300,style_weight=1000000, content_weight=1):"""Run the style transfer."""print('Building the style transfer model..')model, style_losses, content_losses = get_style_model_and_losses(cnn,normalization_mean, normalization_std, style_img, content_img)# We want to optimize the input and not the model parameters so we# update all the requires_grad fields accordinglyinput_img.requires_grad_(True)# We also put the model in evaluation mode, so that specific layers# such as dropout or batch normalization layers behave correctly.model.eval()model.requires_grad_(False)optimizer = get_input_optimizer(input_img)print('Optimizing..')run = [0]while run[0] <= num_steps:def closure():# correct the values of updated input imagewith torch.no_grad():input_img.clamp_(0, 1)optimizer.zero_grad()model(input_img)style_score = 0content_score = 0for sl in style_losses:style_score += sl.lossfor cl in content_losses:content_score += cl.lossstyle_score *= style_weightcontent_score *= content_weightloss = style_score + content_scoreloss.backward()run[0] += 1if run[0] % 50 == 0:print("run {}:".format(run))print('Style Loss : {:4f} Content Loss: {:4f}'.format(style_score.item(), content_score.item()))print()return style_score + content_scoreoptimizer.step(closure)# a last correction...with torch.no_grad():input_img.clamp_(0, 1)return input_img

最后，我们可以运行算法。

output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,content_img, style_img, input_img)plt.figure()
imshow(output, title='Output Image')# sphinx_gallery_thumbnail_number = 4
plt.ioff()
plt.show()

Building the style transfer model..
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_device.py:77: UserWarning:To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).Optimizing..
run [50]:
Style Loss : 4.124115 Content Loss: 4.153235run [100]:
Style Loss : 1.121803 Content Loss: 3.012928run [150]:
Style Loss : 0.696039 Content Loss: 2.639936run [200]:
Style Loss : 0.469292 Content Loss: 2.485867run [250]:
Style Loss : 0.341620 Content Loss: 2.400899run [300]:
Style Loss : 0.263747 Content Loss: 2.347282

脚本总运行时间：（0分38.249秒）

参考资料：

Neural Transfer Using PyTorch — PyTorch Tutorials 2.1.0+cu121 documentation