PyTorch-Transforms

本文介绍了PyTorch中用于图像预处理的Transforms模块,包括ToTensor用于将PIL图像转换为张量,Normalize进行归一化处理,Resize进行图像尺寸调整,Compose组合多个变换,以及RandomCrop进行随机裁剪。示例代码展示了这些变换的使用方法,并利用tensorboard进行可视化。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

1. 基本知识

2. 常见的Transforms

2.1 ToTensor的使用 

2.2 Normalize的使用

2.3 Resize的使用

2.4 Compose的使用

2.5 RandomCrop的使用 


1. 基本知识

图片引用来自b站up主:我是土堆

tf.py(如何使用transforms里面的工具)

具体的工具还是要看里面的类是如何去定义的,然后再使用。

from PIL import Image
from torchvision import transforms

img_path = 'data/train/ants_image/0013035.jpg'
img = Image.open(img_path)
tensor_trans = transforms.ToTensor()
tensor_img = tensor_trans(img)
print(tensor_img)

result:

tensor([[[0.3137, 0.3137, 0.3137,  ..., 0.3176, 0.3098, 0.2980],
         [0.3176, 0.3176, 0.3176,  ..., 0.3176, 0.3098, 0.2980],
         [0.3216, 0.3216, 0.3216,  ..., 0.3137, 0.3098, 0.3020],
         ...,
         [0.3412, 0.3412, 0.3373,  ..., 0.1725, 0.3725, 0.3529],
         [0.3412, 0.3412, 0.3373,  ..., 0.3294, 0.3529, 0.3294],
         [0.3412, 0.3412, 0.3373,  ..., 0.3098, 0.3059, 0.3294]],

        [[0.5922, 0.5922, 0.5922,  ..., 0.5961, 0.5882, 0.5765],
         [0.5961, 0.5961, 0.5961,  ..., 0.5961, 0.5882, 0.5765],
         [0.6000, 0.6000, 0.6000,  ..., 0.5922, 0.5882, 0.5804],
         ...,
         [0.6275, 0.6275, 0.6235,  ..., 0.3608, 0.6196, 0.6157],
         [0.6275, 0.6275, 0.6235,  ..., 0.5765, 0.6275, 0.5961],
         [0.6275, 0.6275, 0.6235,  ..., 0.6275, 0.6235, 0.6314]],

        [[0.9137, 0.9137, 0.9137,  ..., 0.9176, 0.9098, 0.8980],
         [0.9176, 0.9176, 0.9176,  ..., 0.9176, 0.9098, 0.8980],
         [0.9216, 0.9216, 0.9216,  ..., 0.9137, 0.9098, 0.9020],
         ...,
         [0.9294, 0.9294, 0.9255,  ..., 0.5529, 0.9216, 0.8941],
         [0.9294, 0.9294, 0.9255,  ..., 0.8863, 1.0000, 0.9137],
         [0.9294, 0.9294, 0.9255,  ..., 0.9490, 0.9804, 0.9137]]])

Process finished with exit code 0

tensor数据类型包含了反向传播理论基础的一些参数

tf.py

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
import cv2

writer = SummaryWriter('logs')
img_path = 'data/train/ants_image/0013035.jpg'
img = Image.open(img_path)
tensor_trans = transforms.ToTensor()
tensor_img = tensor_trans(img)
writer.add_image('tensor_img', tensor_img)
writer.close()

2. 常见的Transforms

一些图片返回类型:

Image.open() -- PIL

ToTensor() -- tensor

cv.imread() / np.array() -- narrays

2.1 ToTensor的使用 

目的:Convert a 'PIL Image' or 'numpy.ndarray' to tensor.

useful_tf.py

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter('logs')
img = Image.open('data/train/ants_image/5650366_e22b7e1065.jpg')
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
writer.add_image('ToTensor', img_tensor)
writer.close()

2.2 Normalize的使用

目的:Normalize a tensor image with mean and standard deviation.

useful_tf.py

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter('logs')
img = Image.open('data/train/ants_image/5650366_e22b7e1065.jpg')

# ToTensor
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
writer.add_image('ToTensor', img_tensor)

# Normalize
# output[channel] = (input[channel] - mean[channel]) / std[channel]
print(img_tensor[0][0][0])
trans_norm = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
img_norm = trans_norm(img_tensor)
print(img_norm[0][0][0])
writer.add_image('Normalize', img_norm)

writer.close()

tensor(0.3804)
tensor(-0.2392)

 

2.3 Resize的使用

目的:Resize the input image to the given size.

useful_tf.py

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter('logs')
img = Image.open('data/train/ants_image/5650366_e22b7e1065.jpg')

# ToTensor
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
writer.add_image('ToTensor', img_tensor)

# Resize
print(img.size)
trans_resize = transforms.Resize((512, 512))
# img PIL -> resize -> PIL -> tensor
img_resize = trans_tensor(trans_resize(img))
print(img_resize.size)
writer.add_image('Resize', img_resize)

writer.close()

2.4 Compose的使用

目的:Compose several transforms together.

useful_tf.py

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter('logs')
img = Image.open('data/train/ants_image/5650366_e22b7e1065.jpg')

# ToTensor
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
writer.add_image('ToTensor', img_tensor)

# Resize
trans_resize = transforms.Resize((512, 512))
# img PIL -> resize -> PIL -> tensor
img_resize = trans_tensor(trans_resize(img))
writer.add_image('Resize', img_resize)

# Resize2
trans_resize_2 = transforms.Resize(512)
trans_compose = transforms.Compose([trans_resize_2, trans_tensor])
img_resize_2 = trans_compose(img)
writer.add_image('Resize-2', img_resize_2)

writer.close()

2.5 RandomCrop的使用 

目的:Crop the given image at a random location.

useful_tf.py

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter('logs')
img = Image.open('data/train/ants_image/5650366_e22b7e1065.jpg')

# ToTensor
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
writer.add_image('ToTensor', img_tensor)

# Resize
trans_resize = transforms.Resize((512, 512))
# img PIL -> resize -> PIL -> tensor
img_resize = trans_tensor(trans_resize(img))
writer.add_image('Resize', img_resize)

# Resize2
trans_resize_2 = transforms.Resize(512)
trans_compose = transforms.Compose([trans_resize_2, trans_tensor])
img_resize_2 = trans_compose(img)
writer.add_image('Resize-2', img_resize_2)

# RandomCrop
trans_random = transforms.RandomCrop(200)
trans_compose_2 = transforms.Compose([trans_random, trans_tensor])
for i in range(5):
    img_crop = trans_compose_2(img)
    writer.add_image('RandomCrop', img_crop, i)
writer.close()

 

 

<think>嗯,用户想用PyTorchtransforms来处理不同状态的下料口图像,构建一个丰富的数据集。首先,我得回忆一下PyTorch的图像预处理步骤,特别是transforms模块的常用方法。用户提到了光照变化、角度调整和噪声添加,这些都是数据增强的常见手段。 光照变化的话,ColorJitter应该是个不错的选择,可以调整亮度、对比度、饱和度和色调。不过要注意,如果图像是灰度图,可能需要排除某些参数。然后角度调整,RandomRotation和RandomAffine可以处理旋转和仿射变换,这样模型能适应不同角度的图像。噪声添加方面,可能需要自定义一个转换,比如用AddGaussianNoise类,在图像上添加高斯噪声。 接下来是数据标准化,用Normalize,但需要计算数据集的均值和标准差,用户可能需要先加载数据计算这些值,或者用经验值。另外,组合这些转换的时候,应该用Compose把它们按顺序组合起来,可能还需要区分训练和验证集的不同处理,比如训练集做增强,验证集只做标准化和转Tensor。 数据加载部分,用户需要确保图像路径的结构正确,比如每个类别一个文件夹。可能需要用ImageFolder来加载数据集,然后分成训练集和验证集。DataLoader的设置也要注意batch_size、shuffle等参数。 不过,用户的问题里可能有些隐含的需求。比如,他们可能担心数据增强后的图像是否会影响模型对正常、异常状态的判断,特别是堵塞这类关键特征。这时候可能需要建议他们先可视化增强后的图像,确保关键特征没有被过度扭曲。另外,预处理步骤的顺序可能会影响结果,比如先旋转再调整亮度,和先调整亮度再旋转,效果可能不同,需要根据实际情况调整顺序。 还有,用户可能没有提到数据不平衡的问题,比如不同状态的图像数量差异大。这时候可能需要结合过采样或欠采样,或者在DataLoader中使用加权采样。不过用户的问题主要是预处理,这部分可能需要另外处理。 最后,生成相关问题的时候,要考虑到用户接下来的步骤,比如模型选择、处理数据不平衡、实时预处理的方法,以及如何评估预处理的效果。这些都是用户可能会遇到的后续问题,需要提前考虑到。</think>以下是如何使用PyTorch Transforms进行图像预处理的详细解决方案: 一、基础预处理配置 1. 标准预处理流程建议: ```python from torchvision import transforms basic_transform = transforms.Compose([ transforms.Resize((256, 256)), # 统一图像尺寸 transforms.ToTensor(), # 转换为Tensor格式 transforms.Normalize(mean=[0.485, 0.456, 0.406], # ImageNet标准归一化 std=[0.229, 0.224, 0.225]) # 根据实际数据集可调整 ]) ``` 二、数据增强实现 1. 光照变化增强: ```python light_transform = transforms.ColorJitter( brightness=0.3, # 亮度变化±30% contrast=0.2, # 对比度变化±20% saturation=0.2, # 饱和度变化±20% hue=0.1 # 色相偏移±0.1 ) ``` 2. 几何变换增强: ```python geo_transform = transforms.Compose([ transforms.RandomRotation(15), # 随机旋转±15度 transforms.RandomAffine( degrees=0, translate=(0.1, 0.1), # 平移10%范围 scale=(0.9, 1.1)), # 缩放90%-110% transforms.RandomHorizontalFlip(p=0.5) # 50%概率水平翻转 ]) ``` 3. 噪声注入增强(自定义转换): ```python class AddGaussianNoise(object): def __init__(self, mean=0., std=0.1): self.std = std self.mean = mean def __call__(self, tensor): return tensor + torch.randn(tensor.size()) * self.std + self.mean def __repr__(self): return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std) ``` 三、完整增强流程配置 ```python final_transform = transforms.Compose([ transforms.Resize((320, 320)), geo_transform, light_transform, transforms.RandomGrayscale(p=0.1), # 10%概率转为灰度 transforms.ToTensor(), AddGaussianNoise(std=0.05), # 添加5%高斯噪声 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) ``` 四、数据集加载实现 ```python from torchvision.datasets import ImageFolder dataset = ImageFolder( root='path/to/dataset', transform=final_transform ) # 划分训练/验证集 train_size = int(0.8 * len(dataset)) val_size = len(dataset) - train_size train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size]) # 创建数据加载器 train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=32, shuffle=True, num_workers=4 ) ``` 关键注意点: 1. 数据增强顺序会影响最终效果,建议先做几何变换再做色彩调整[^2] 2. 不同状态(正常/异常/堵塞)应保持相同增强策略,避免引入偏差 3. 建议保存原始图像和增强后图像的对比样本,用于质量检查[^4] 4. 噪声强度需根据图像质量调整,避免过度干扰关键特征
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值