transform是torchvision下的一个.py
文件,这个python文件中定义了很多的类和方法,主要实现对图片进行一些变换操作
一、Transforms讲解
from torchvision import transforms#按着Ctrl,点击transforms
进入到__init__.py文件中
from .transforms import *#再次按着Ctrl,点击.transforms
from .autoaugment import *
进入transform.py
文件中,可以看到transforms其实就是transform.py一个python文件,可以理解为其是一个工具包
点击Structure,或Alt+7,查看下这个文件的大概结构框架
File–Settings–keymap–structure,可以查看快捷键
通俗点:transform指的就是transform.py
文件,该文件里面有好多类,可以对图像进行各种各样的操作
二、ToTensor类
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数
"""Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. This transform does not support torchscript.
#可以看到其实就将PIL Image、numpy.ndarray类型的图片转换为tensor类型
#PIL针对的是Python自带的Image进行open操作;numpy.ndarray针对的是OpenCV的imread操作Converts a PIL Image or numpy.ndarray (H x W x C) in the range[0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)or if the numpy.ndarray has dtype = np.uint8In the other cases, tensors are returned without scaling... note::Because the input image is scaled to [0.0, 1.0], this transformation should not be used whentransforming target image masks. See the `references`_ for implementing the transforms for image masks... _references: https://github.com/pytorch/vision/tree/main/references/segmentation"""
Ⅰ通过PIL的Image读取图片类型为PIL,使用ToTensor将图片类型转换为tensor,并通过add_image上传tensorbord
import cv2 as cv
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transformsimg_path = "G:/PyCharm/workspace/learning_pytorch/dataset/a/3.jpg"# 通过Image打开的图片类型为PIL
img = Image.open(img_path)
print(type(img))#<class 'PIL.JpegImagePlugin.JpegImageFile'># # 通过opencv的imread打开的图片类型为numpy.ndarray
# img = cv.imread(img_path)
# print(type(img))#<class 'numpy.ndarray'>#通过transforms的ToTensor即可转换为Tensor类型
tensor_trans = transforms.ToTensor()#创建ToTensor对象
tensor_img = tensor_trans(img)#Ctrl+p 查看需要传入的参数,传入图片
print(type(tensor_img))#<class 'torch.Tensor'>
print(tensor_img.shape)#torch.Size([3, 299, 300])"""
add_image()要求:
①图片类型为torch.Tensor, numpy.array, or string/blobname
②图片尺寸规格为(3, H, W),若不一样需要通过dataformats参数进行声明
很显然tensor_img满足add_image的基本要求,可以直接传入使用
"""writer = SummaryWriter("y_log")writer.add_image("tensor_img",tensor_img)#默认从0开始
writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312
,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。
点击该链接或者复制链接到浏览器打开即可
Ⅱ为啥神经网络中传入的图片数据类型必须是tensor?
打开Python Console,将上面的代码复制运行
可以看到tensor包含grad梯度等信息,也就是tensor数据类型包装了神经网络所需要的一些参数信息
Ⅲ__call__方法的作用
transform.py
文件中的ToTensor类下面有一个__call__方法,接下来进行探讨下该方法的作用是啥
class Band:def __call__(self, bandname):print("call-"+bandname)def music_band(self,bandname):print("hello-"+bandname)band = Band()
band("beyond")#call-beyond
band.music_band("huangjiaju")#hello-huangjiaju
由结果可以看出,在Band类中,若直接对其对象传入参数,会使用__call__方法;若指定某个方法名称才会使用某方法。其实__call__方法起到默认优先考虑的效果而已。
三、ToPILImage类
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数
"""Convert a tensor or an ndarray to PIL Image. This transform does not support torchscript.
#将tensor、ndarray 转换为PIL类型Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shapeH x W x C to a PIL Image while preserving the value range.Args:mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).If ``mode`` is ``None`` (default) there are some assumptions made about the input data:- If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.- If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.- If the input has 2 channels, the ``mode`` is assumed to be ``LA``.- If the input has 1 channel, the ``mode`` is determined by the data type (i.e ``int``, ``float``,``short``)... _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes"""
通过ToPILImage方法可将tensor、ndarray类型图片转换为PIL类型
from torch.utils.tensorboard import SummaryWriter
from PIL import Image
import cv2 as cv
import numpy as np
from torchvision import transformsimg_path = "G:/PyCharm/workspace/learning_pytorch/dataset/a/3.jpg"img = cv.imread(img_path)
type(img)#numpy.ndarrayPIL = transforms.ToPILImage()
PIL_img = PIL(img)
type(PIL_img)#PIL.Image.ImagePIL_img.show()#展示照片cv.imshow("img",img)#展示照片
cv.waitKey(0)
cv.destroyAllWindows()
四、Normalize类
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数
"""Normalize a tensor image with mean and standard deviation.
#用均值和标准差归一化张量图像,也就是归一化操作This transform does not support PIL Image.Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``channels, this transform will normalize each channel of the input``torch.*Tensor`` i.e.,``output[channel] = (input[channel] - mean[channel]) / std[channel]``.. note::This transform acts out of place, i.e., it does not mutate the input tensor.Args:mean (sequence): Sequence of means for each channel.std (sequence): Sequence of standard deviations for each channel.inplace(bool,optional): Bool to make this operation in-place."""
使用要求:必须是tensor类型,由文档介绍可得:
output[channel] = (input[channel] - mean[channel]) / std[channel]
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transformswrite = SummaryWriter("y_log")img_path = "dataset/b/6.jpg"img = cv.imread(img_path)
print(type(img))#<class 'numpy.ndarray'>
print(img.size)#61375
print(img.shape)#(375, 499, 3)trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
print(type(img_tensor))#<class 'torch.Tensor'>print(img_tensor[0][0][0])#tensor(0.5255)
trans_normalize = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
img_normalize = trans_normalize(img_tensor)
print(img_normalize[0][0][0])#tensor(0.0510)#公式:output[channel] = (input[channel] - mean[channel]) / std[channel]
#(0.5255-0.5)/0.5 = 0.051print(img_normalize.shape)#torch.Size([3, 375, 499])
#shape符合add_image的要求(C,H,W),可直接传入使用write.add_image("img_normalize",img_normalize)write.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312
,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。
点击该链接或者复制链接到浏览器打开即可
五、Resize类
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数
"""Resize the input image to the given size.
#将输入图像调整为给定大小,也就是对输入图像进行尺寸变换If the image is torch Tensor, it is expectedto have [..., H, W] shape, where ... means an arbitrary number of leading dimensions.. warning::The output image might be different depending on its type: when downsampling, the interpolation of PIL imagesand tensors is slightly different, because PIL applies antialiasing. This may lead to significant differencesin the performance of a network. Therefore, it is preferable to train and serve a model with the same inputtypes. See also below the ``antialias`` parameter, which can help making the output of PIL images and tensorscloser.Args:size (sequence or int): Desired output size. If size is a sequence like(h, w), output size will be matched to this. If size is an int,smaller edge of the image will be matched to this number.i.e, if height > width, then image will be rescaled to(size * height / width, size).
#需要给出要裁剪成的形状(h,w),若只给一个数,则默认裁剪成一个正方形.. note::In torchscript mode size as single int is not supported, use a sequence of length 1: ``[size, ]``.interpolation (InterpolationMode): Desired interpolation enum defined by:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` and``InterpolationMode.BICUBIC`` are supported.For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.max_size (int, optional): The maximum allowed for the longer edge ofthe resized image: if the longer edge of the image is greaterthan ``max_size`` after being resized according to ``size``, thenthe image is resized again so that the longer edge is equal to``max_size``. As a result, ``size`` might be overruled, i.e thesmaller edge may be shorter than ``size``. This is only supportedif ``size`` is an int (or a sequence of length 1 in torchscriptmode).antialias (bool, optional): antialias flag. If ``img`` is PIL Image, the flag is ignored and anti-aliasis always used. If ``img`` is Tensor, the flag is False by default and can be set to True for``InterpolationMode.BILINEAR`` only mode. This can help making the output for PIL images and tensorscloser... warning::There is no autodiff support for ``antialias=True`` option with input ``img`` as Tensor."""
输入类型为PIL图片,通过Resize转换大小,再通过ToTensor转换为tensor类型上传tensorboard
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transformswrite = SummaryWriter("y_log")img_path = "dataset/b/6.jpg"img = Image.open(img_path)
print(type(img))#<class 'PIL.JpegImagePlugin.JpegImageFile'>
print(img.size)#(499, 375) 原始图片的大小
trans_resize = transforms.Resize((300,300))
img_PIL_resize = trans_resize(img)#进行裁剪
print(img_PIL_resize)#<PIL.Image.Image image mode=RGB size=300x300 at 0x1FDDC07C9B0> 原图像已经变成了(300,300),但还是PIL类型#要想上传到tensorboard上,必须是tensor、numpy.array类型,这里通过ToTensor方法转换为tensor
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img_PIL_resize)
print(type(img_tensor))#<class 'torch.Tensor'>write.add_image("img_PIL_resize",img_tensor)#默认从0开始write.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312
,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。
点击该链接或者复制链接到浏览器打开即可
与下面的归一化之后的图像相比,大小很明显发生了变化
六、Compose类
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数
"""Composes several transforms together. This transform does not support torchscript.
#组合一些transforms一起使用Please, see the note below.Args:transforms (list of ``Transform`` objects): list of transforms to compose.Example:>>> transforms.Compose([>>> transforms.CenterCrop(10),#先对图片进行一次中心裁剪>>> transforms.PILToTensor(),#再对图片转换为tensor>>> transforms.ConvertImageDtype(torch.float),#之后再将图像转换为dtype,如果需要,缩放其值>>> ])#一个Compose可以实现多次的transforms对图片进行操作.. note::In order to script the transformations, please use ``torch.nn.Sequential`` as below.>>> transforms = torch.nn.Sequential(>>> transforms.CenterCrop(10),>>> transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),>>> )>>> scripted_transforms = torch.jit.script(transforms)Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor``, does not require`lambda` functions or ``PIL.Image``."""
说白了就是组合多种transform操作
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transformswriter = SummaryWriter('y_log')img_path = "dataset/b/6.jpg"img = Image.open(img_path)
print(type(img))#<class 'PIL.JpegImagePlugin.JpegImageFile'>
print(img.size)#(499, 375) 原始图片的大小
#①剪切尺寸
trans_resize = transforms.Resize((300,300))
img_PIL_resize = trans_resize(img)#进行裁剪
print(img_PIL_resize)#<PIL.Image.Image image mode=RGB size=300x300 at 0x1FDDC07C9B0> 原图像已经变成了(300,300),但还是PIL类型#②PIL转Tensor
trans_tensor = transforms.ToTensor()trans_compose = transforms.Compose([trans_resize,trans_tensor])
#Compose参数都是transform对象,且第一个输出必须满足第二个输入
#trans_resize为Resize对象,最后输出为PIL类型
#trans_tensor为ToTensor对象,输入为PIL,输出为tensorimg_all = trans_compose(img)
#因为最后输出为tensor,故才可以通过add_image上传至tensorboardwriter.add_image("compose_img",img_all)
writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312
,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。
点击该链接或者复制链接到浏览器打开即可,该操作其实就是将Resize和ToTensor进行了整合使用而已
八、RandomCrop类
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数
"""Crop the given image at a random location.If the image is torch Tensor, it is expectedto have [..., H, W] shape, where ... means an arbitrary number of leading dimensions,but if non-constant padding is used, the input is expected to have at most 2 leading dimensionsArgs:size (sequence or int): Desired output size of the crop. If size is anint instead of sequence like (h, w), a square crop (size, size) ismade. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).padding (int or sequence, optional): Optional padding on each borderof the image. Default is None. If a single int is provided thisis used to pad all borders. If sequence of length 2 is provided this is the paddingon left/right and top/bottom respectively. If a sequence of length 4 is providedthis is the padding for the left, top, right and bottom borders respectively.
#需要给出要裁剪成的形状(h,w),若只给一个数,则默认裁剪成一个正方形.. note::In torchscript mode padding as single int is not supported, use a sequence oflength 1: ``[padding, ]``.pad_if_needed (boolean): It will pad the image if smaller than thedesired size to avoid raising an exception. Since cropping is doneafter padding, the padding seems to be done at a random offset.fill (number or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple oflength 3, it is used to fill R, G, B channels respectively.This value is only used when the padding_mode is constant.Only number is supported for torch Tensor.Only int or str or tuple value is supported for PIL Image.padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric.Default is constant.- constant: pads with a constant value, this value is specified with fill- edge: pads with the last value at the edge of the image.If input a 5D torch Tensor, the last 3 dimensions will be padded instead of the last 2- reflect: pads with reflection of image without repeating the last value on the edge.For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect modewill result in [3, 2, 1, 2, 3, 4, 3, 2]- symmetric: pads with reflection of image repeating the last value on the edge.For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric modewill result in [2, 1, 1, 2, 3, 4, 4, 3]"""
说白了就是随机对图片进行裁剪
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transformswriter = SummaryWriter('y_log')img_path = "dataset/b/6.jpg"img = Image.open(img_path)
print(type(img))#<class 'PIL.JpegImagePlugin.JpegImageFile'>
print(img.size)#(499, 375) 原始图片的大小
#①随机剪切尺寸
trans_random = transforms.RandomCrop((200,250))#(h,w)
img_PIL_random = trans_random(img)#随机进行裁剪
print(img_PIL_random)#<PIL.Image.Image image mode=RGB size=250,200 at 0x1FDDC07C9B0>
#PIL输出为(w,h),即原图像已经变成了(h,w),(200,250),但还是PIL类型#②PIL转Tensor
trans_tensor = transforms.ToTensor()trans_compose = transforms.Compose([trans_random,trans_tensor])
#Compose参数都是transform对象,且第一个输出必须满足第二个输入
#trans_resize为Resize对象,最后输出为PIL类型
#trans_tensor为ToTensor对象,输入为PIL,输出为tensorfor i in range(10):img_randomcrop = trans_compose(img)# 因为最后输出为tensor,故才可以通过add_image上传至tensorboardwriter.add_image("img_randomcrop",img_randomcrop,i)writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312
,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。
点击该链接或者复制链接到浏览器打开即可
七、CenterCrop类
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数
"""Crops the given image at the center.
#对图像进行中心裁剪If the image is torch Tensor, it is expectedto have [..., H, W] shape, where ... means an arbitrary number of leading dimensions.If image size is smaller than output size along any edge, image is padded with 0 and then center cropped.Args:size (sequence or int): Desired output size of the crop. If size is anint instead of sequence like (h, w), a square crop (size, size) ismade. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])."""
说白了就是对图像进行中心裁剪
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transformswriter = SummaryWriter('y_log')img_path = "dataset/b/6.jpg"img = Image.open(img_path)
print(type(img))#<class 'PIL.JpegImagePlugin.JpegImageFile'>
print(img.size)#(499, 375) 原始图片的大小
#①中间剪切尺寸
trans_center = transforms.CenterCrop((200,250))#(h,w)
img_PIL_center = trans_center(img)#随机进行裁剪
print(img_PIL_center)#<PIL.Image.Image image mode=RGB size=250,200 at 0x1FDDC07C9B0>
#PIL输出为(w,h),即原图像已经变成了(h,w),(200,250),但还是PIL类型#②PIL转Tensor
trans_tensor = transforms.ToTensor()trans_compose = transforms.Compose([trans_center,trans_tensor])
#Compose参数都是transform对象,且第一个输出必须满足第二个输入
#trans_resize为Resize对象,最后输出为PIL类型
#trans_tensor为ToTensor对象,输入为PIL,输出为tensorimg_centercrop = trans_compose(img)writer.add_image("img_centercrop",img_centercrop)
writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312
,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。
点击该链接或者复制链接到浏览器打开即可