扩散模型从原理到实践第二单元,教程地址
创建一个类别条件扩散模型
- 配置和数据准备
导入torch,DataLoader,扩散模型库diffusers,画图库natplotlib,进度条tqdm
import torch
import torchvision
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from diffusers import DDPMScheduler, UNet2DModel
from matplotlib import pyplot as plt
from tqdm.auto import tqdm
device = 'mps' if torch.backends.mps.is_available() else 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device}')
- first,加载数据集mnist,second 构造dataloader,third 迭代dataloader,将数据放到模型
# Load the dataset
dataset = torchvision.datasets.MNIST(root="mnist/", train=True, download=True, transform=torchvision.transforms.ToTensor())
# Feed it into a dataloader (batch size 8 here just for demo)
train_dataloader = DataLoader(dataset, batch_size=8, shuffle=True)
# View some examples
x, y = next(iter(train_dataloader))
print('Input shape:', x.shape)
print('Labels:', y)
plt.imshow(torchvision.utils.make_grid(x)[0], cmap='Greys');

创建一个以类别为条件的 UNet
我们输入类别这一条件的方法是:
- 创建一个标准的 UNet2DModel,加入一些额外的输入通道
- 通过一个嵌入层,把类别标签映射到一个 (class_emb_size) 形状的学到的向量上
- 把这个信息作为额外通道和原有的输入向量拼接起来,用这行代码:net_input = torch.cat((x, class_cond), 1)
- 把这个 net_input (有 class_emb_size+1 个通道)输入到UNet中得到最终预测
在这个例子中,我把 class_emb_size 设成4,但这其实是可以任意修改的,你可以试试从把它设成1(你可以看看这有没有用)到把它设成 10(正好是类别总数),或者把需要学到的 nn.Embedding 换成简单的对类别进行独热编码(one-hot encoding)。
class ClassConditionedUnet(nn.Module):
def __init__(self, num_classes=10, class_emb_size=4):
super().__init__()
# The embedding layer will map the class label to a vector of size class_emb_size
self.class_emb = nn.Embedding(num_classes, class_emb_size)
# Self.model is an unconditional UNet with extra input channels to accept the conditioning information (the class embedding)
self.model = UNet2DModel(
sample_size=28, # the target image resolution
in_channels=1 + class_emb_size, # Additional input channels for class cond.
out_channels=1, # the number of output channels
layers_per_block=2, # how many ResNet layers to use per UNet block
block_out_channels=(32, 64, 64),
down_block_types=(
"DownBlock2D", # a regular ResNet downsampling block
"AttnDownBlock2D", # a ResNet downsampling block with spatial self-attention
"AttnDownBlock2D",
),
up_block_types=(
"AttnUpBlock2D",
"AttnUpBlock2D", # a ResNet upsampling block with spatial self-attention
"UpBlock2D", # a regular ResNet upsampling block
),
)
# Our forward method now takes the class labels as an additional argument
def forward(self, x, t, class_labels):
# Shape of x:
bs, ch, w, h = x.shape
# class conditioning in right shape to add as additional input channels
class_cond = self.class_emb(class_labels) # Map to embedding dinemsion
class_cond = class_cond.view(bs, class_cond.shape[1], 1, 1).expand(bs, class_cond.shape[1], w, h)
# x is shape (bs, 1, 28, 28) and class_cond is now (bs, 4, 28, 28)
# Net input is now x and class cond concatenated together along dimension 1
net_input = torch.cat((x, class_cond), 1) # (bs, 5, 28, 28)
# Feed this to the unet alongside the timestep and return the prediction
return self.model(net_input, t).sample # (bs, 1, 28, 28)
训练和采样
不同于别的地方使用的prediction = unet(x, t),这里我们使用prediction = unet(x, t, y),在训练时把正确的标签作为第三个输入送到模型中。在推理阶段,我们可以输入任何我们想要的标签,如果一切正常,那模型就会输出与之匹配的图片。y在这里时 MNIST 中的数字标签,值的范围从0到9。
这里的训练循环很像第一单元的例子。我们这里预测的是噪声(而不是像第一单元的去噪图片),以此来匹配 DDPMScheduler 预计的目标。这里我们用 DDPMScheduler 来在训练中加噪声,并在推理时采样用。训练也需要一段时间 —— 如何加速训练也可以是个有趣的小项目。但你也可以跳过运行代码(甚至整节笔记本),因为我们这里纯粹是在讲解思路。
# Create a scheduler
noise_scheduler = DDPMScheduler(num_train_timesteps=1000, beta_schedule='squaredcos_cap_v2')
#@markdown Training loop (10 Epochs):
# Redefining the dataloader to set the batch size higher than the demo of 8
train_dataloader = DataLoader(dataset, batch_size=128, shuffle=True)
# How many runs through the data should we do?
n_epochs = 10
# Our network
net = ClassConditionedUnet().to(device)
# Our loss finction
loss_fn = nn.MSELoss()
# The optimizer
opt = torch.optim.Adam(net.parameters(), lr=1e-3)
# Keeping a record of the losses for later viewing
losses = []
# The training loop
for epoch in range(n_epochs):
for x, y in tqdm(train_dataloader):
# Get some data and prepare the corrupted version
x = x.to(device) * 2 - 1 # Data on the GPU (mapped to (-1, 1))
y = y.to(device)
noise = torch.randn_like(x)
timesteps = torch.randint(0, 999, (x.shape[0],)).long().to(device)
noisy_x = noise_scheduler.add_noise(x, noise, timesteps)
# Get the model prediction
pred = net(noisy_x, timesteps, y) # Note that we pass in the labels y
# Calculate the loss
loss = loss_fn(pred, noise) # How close is the output to the noise
# Backprop and update the params:
opt.zero_grad()
loss.backward()
opt.step()
# Store the loss for later
losses.append(loss.item())
# Print our the average of the last 100 loss values to get an idea of progress:
avg_loss = sum(losses[-100:])/100
print(f'Finished epoch {epoch}. Average of the last 100 loss values: {avg_loss:05f}')
# View the loss curve
plt.plot(losses)

画图
#@markdown Sampling some different digits:
# Prepare random x to start from, plus some desired labels y
x = torch.randn(80, 1, 28, 28).to(device)
y = torch.tensor([[i]*8 for i in range(10)]).flatten().to(device)
# Sampling loop
for i, t in tqdm(enumerate(noise_scheduler.timesteps)):
# Get model pred
with torch.no_grad():
residual = net(x, t, y) # Again, note that we pass in our labels y
# Update sample with step
x = noise_scheduler.step(residual, t, x).prev_sample
# Show the results
fig, ax = plt.subplots(1, 1, figsize=(12, 12))
ax.imshow(torchvision.utils.make_grid(x.detach().cpu().clip(-1, 1), nrow=8)[0], cmap='Greys')

就是这么简单!我们现在已经对要生成的图片有所控制了。
本文介绍了如何在PyTorch中使用类条件扩散模型,如UNet2DModel,结合DDPMScheduler在MNIST数据集上进行训练和样本生成,允许对生成的图像进行类别控制。
570

被折叠的 条评论
为什么被折叠?



