小土堆pytorch学习笔记（五、常见的Transforms(1-2))

Holy_cow

已于 2023-07-31 09:19:37 修改

阅读量229

点赞数 1

文章标签： pytorch 学习笔记深度学习 python

于 2023-07-30 23:00:20 首次发布

本文链接：https://blog.youkuaiyun.com/Holy_cow/article/details/132008195

版权

一、图片类型的打开方式

在这里插入图片描述
不同的数据类型打开的方式是不一样的，如上图所示：
PIL类型的图片是用Image.open( )打开的，tensor类型的图片是用ToTensor( )打开的，narrays类型的图片是用cv.imread（）打开的。

二、python中call的用法

运行如下代码：

class Person:
    def __call__(self, name):
        print("__call__" + "hello" + name)

    def hello(self, name):
        print("hello" + name)


person = Person()
person("zhangshan")
person.hello("lisi")

结果如下：

__call__hellozhangshan
hellolisi

可以看出两种函数调用方法的区别，__call__方法可以直接调用，传入参数就可，相当于把类当作函数来调用。__hello__函数得对象名打点调用。

三、学习ToTensor类和Normalize类

代码如下：

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter("logs")
img = Image.open("hymenoptera_dataset/train/ants/0013035.jpg")
print(img)

# 将图片转换为tensor类型
trans_totensor = transforms.ToTensor()
img_tensor = trans_totensor(img)
writer.add_image("ToTensor", img_tensor)

# Normalize 归一化
print(img_tensor[0][0][0])
trans_norm = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
img_norm = trans_norm(img_tensor)
print(img_norm[0][0][0])

writer.close()

结果如下：

tensor(0.3137)
tensor(-0.3725)

跟小土堆的结果好像有点不同,是因为我们使用的图片不同，他使用的是pytorch标签图，我是继续使用的蚂蚁的图片。
其中归一化的公式如下：
在这里插入图片描述
验算;2 * 0.3137 - 1 = -0.3726，与代码运行结果差了0.0001，可能是计算机数据十进制和二进制进行转换时的差异。

打开tensorboard，看normalize后的图片差别。
在这里插入图片描述

以下是完整代码：

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter("logs")
img = Image.open("hymenoptera_dataset/train/ants/0013035.jpg")
print(img)

# 将图片转换为tensor类型
trans_totensor = transforms.ToTensor()
img_tensor = trans_totensor(img)
writer.add_image("ToTensor", img_tensor)

# Normalize 归一化
print(img_tensor[0][0][0])
trans_norm = transforms.Normalize([1, 3, 5], [3, 2, 1])
img_norm = trans_norm(img_tensor)
print(img_norm[0][0][0])
writer.add_image("Normalize", img_tensor)

writer.close()

四、学习resize类和compose类

4.1 resize类

# 使用resize类
print(img.size)
trans_resize = transforms.Resize((512, 512))
# img PIL -> resize -> img_resize PIL
img_resize = trans_resize(img)
# img_resize PIL -> totensor -> img_resize tensor
img_resize = trans_totensor(img_resize)
writer.add_image("Resize", img_resize, 0)
print(img_resize)

writer.close()

打开tensorboard,结果如下：
在这里插入图片描述
可以看到，图片的大小发生了改变。

4.2 compose类

compose类的用法
在这里插入图片描述
compose类的定义：

class Compose:
    """Composes several transforms together. This transform does not support torchscript.
    Please, see the note below.

    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.

    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.PILToTensor(),
        >>>     transforms.ConvertImageDtype(torch.float),
        >>> ])

    .. note::
        In order to script the transformations, please use ``torch.nn.Sequential`` as below.

        >>> transforms = torch.nn.Sequential(
        >>>     transforms.CenterCrop(10),
        >>>     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
        >>> )
        >>> scripted_transforms = torch.jit.script(transforms)

        Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor``, does not require
        `lambda` functions or ``PIL.Image``.

    """

代码如下：

# 使用compose类
# compose resize -2 当输入一个数值时，等比例地改变最短边和最长边的对应关系
trans_resize_2 = transforms.Resize(512)
# PIL -> PIL -> tensor
trans_compose = transforms.Compose([trans_resize_2, trans_totensor])
img_resize_2 = trans_compose(img)
writer.add_image("Resize", img_resize_2, 1)

打开tensorboard，运行结果如下：
在这里插入图片描述

五、randomcrop类

5.1 randomcrop类的定义

class RandomCrop(torch.nn.Module):
    """Crop the given image at a random location.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions,
    but if non-constant padding is used, the input is expected to have at most 2 leading dimensions

    Args:
        size (sequence or int): Desired output size of the crop. If size is an
            int instead of sequence like (h, w), a square crop (size, size) is
            made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
        padding (int or sequence, optional): Optional padding on each border
            of the image. Default is None. If a single int is provided this
            is used to pad all borders. If sequence of length 2 is provided this is the padding
            on left/right and top/bottom respectively. If a sequence of length 4 is provided
            this is the padding for the left, top, right and bottom borders respectively.

            .. note::
                In torchscript mode padding as single int is not supported, use a sequence of
                length 1: ``[padding, ]``.
        pad_if_needed (boolean): It will pad the image if smaller than the
            desired size to avoid raising an exception. Since cropping is done
            after padding, the padding seems to be done at a random offset.
        fill (number or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of
            length 3, it is used to fill R, G, B channels respectively.
            This value is only used when the padding_mode is constant.
            Only number is supported for torch Tensor.
            Only int or tuple value is supported for PIL Image.
        padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric.
            Default is constant.

            - constant: pads with a constant value, this value is specified with fill

            - edge: pads with the last value at the edge of the image.
              If input a 5D torch Tensor, the last 3 dimensions will be padded instead of the last 2

            - reflect: pads with reflection of image without repeating the last value on the edge.
              For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
              will result in [3, 2, 1, 2, 3, 4, 3, 2]

            - symmetric: pads with reflection of image repeating the last value on the edge.
              For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
              will result in [2, 1, 1, 2, 3, 4, 4, 3]
    """

使用代码如下：

# 使用RandomCrop
trans_random = transforms.RandomCrop(512)
trans_compose_2 = transforms.Compose([trans_random, trans_totensor])
for i in range(10):
    img_crop = trans_compose_2
    writer.add_image("RandomCrop", img_crop, i)