2024不可不会的StableDiffusion之图生图（七）

赵卓不凡

已于 2024-02-06 10:17:08 修改

阅读量1.1k

点赞数 8

分类专栏： AIGC 文章标签： stable diffusion 计算机视觉深度学习大模型

于 2024-02-06 10:16:46 首次发布

本文链接：https://blog.youkuaiyun.com/sgzqc/article/details/136052430

版权

AIGC 专栏收录该内容

15 篇文章

订阅专栏

本文详细解释了如何使用稳定生成式扩散模型的变种——图生图，通过混合初始种子图像和噪声来控制生成的图像内容。作者提供了代码实现，展示了如何通过`prompt_2_img_i2i`函数以图像作为输入引导生成过程，并通过示例验证了该功能的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 引言

这是稳定生成式扩散模型变种的第二篇文章，在这里我们将介绍图生图的相关知识，图生图（image2image）是对文生图功能的一个扩展，这个功能最初来源于SDEdit这个工作，其核心思路也非常简单：给定一个笔画的色块图像，可以先给它加一定的高斯噪音（执行扩散过程）得到噪音图像，然后基于扩散模型对这个噪音图像进行去噪，就可以生成新的图像，但是这个图像在结构和布局和输入图像基本一致。基于图生图功能，也可以创作出许多令人惊艳的效果。

闲话少说，我们直接开始吧！

2 . 原理介绍

我们知道，在之前的章节里prompt_2_img函数从随机高斯噪声开始生成图像，但如果我们提供初始种子图像来指导扩散过程呢？这正是图像到图像的工作方式。我们可以使用初始种子图像将其与一些噪声（可以由强度参数引导）混合，然后进行多轮扩散，而不再是纯粹依赖于输出图像的文本条件。示例图如下所示：
在这里插入图片描述

3. 代码实现

现在我们将更改之前章节定义的prompt_2_img函数。我们将在prompt_2_img_i2i函数中再额外引入两个参数-

init_img：表示作为种子图像的Image对象
strength：此参数的值将介于0和1之间。表示对输入图片加噪音的程度，这个值越大加的噪音越多，对原始图片的破坏也就越大，当strength=1时，其实就变成了一个随机噪音，此时就相当于纯粹的文生图pipeline了

具体代码实现如下：

def prompt_2_img_i2i(prompts, init_img, neg_prompts=None, g=7.5, seed=100, strength =0.8, steps=50, dim=512, save_int=False):
    text = text_enc(prompt) 
    # Adding an unconditional prompt , helps in the generation process
    if not neg_prompts: uncond =  text_enc([""], text.shape[1])
    else: uncond =  text_enc(neg_prompt, text.shape[1])
    emb = torch.cat([uncond, text])
    
    # Setting the seed
    if seed: torch.manual_seed(seed)
    # Setting number of steps in scheduler
    scheduler.set_timesteps(steps)
    # Convert the seed image to latent
    init_latents = pil_to_latents(init_img)
    # Figuring initial time step based on strength
    init_timestep = int(steps * strength) 
    timesteps = scheduler.timesteps[-init_timestep]
    timesteps = torch.tensor([timesteps], device="cuda")
    # Adding noise to the latents 
    noise = torch.randn(init_latents.shape, generator=None, device="cuda", dtype=init_latents.dtype)
    init_latents = scheduler.add_noise(init_latents, noise, timesteps)
    latents = init_latents
    
    # Computing the timestep to start the diffusion loop
    t_start = max(steps - init_timestep, 0)
    timesteps = scheduler.timesteps[t_start:].to("cuda")
    # Iterating through defined steps
    for i,ts in enumerate(tqdm(timesteps)):
        # We need to scale the i/p latents to match the variance
        inp = scheduler.scale_model_input(torch.cat([latents] * 2), ts)
        # Predicting noise residual using U-Net
        with torch.no_grad(): u,t = unet(inp, ts, encoder_hidden_states=emb).sample.chunk(2)   
        # Performing Guidance
        pred = u + g*(t-u)
        # Conditioning  the latents
        latents = scheduler.step(pred, ts, latents).prev_sample
        # Saving intermediate images
        if save_int: 
            if not os.path.exists(f'./steps'):
                os.mkdir(f'./steps')
            latents_to_pil(latents)[0].save(f'steps/{i:04}.jpeg')
            
    # Returning the latent representation to output an image of 3x512x512
    return latents_to_pil(latents)

需要注意的是，再上述代码中，我们不再是使用随机噪声，而是使用强度参数来计算要添加的噪声量以及运行扩散循环的步数。让我们加载一个初始图像，并通过prompt_2_img_i2i函数运行后查看相应的结果。

4. 功能验证

接着我们读入我们的测试图像，并运行上述代码，如下所示：

p = '1664665907257-noauth.png'
image = Image.open(p).convert('RGB').resize((512,512))
prompt = ["Wolf howling at the moon, photorealistic 4K"]
images = prompt_2_img_i2i(prompts = prompt, init_img = image)
fig, axs = plt.subplots(1, 2, figsize=(12, 6))
for c, img in enumerate([image, images[0]]): 
    axs[c].imshow(img)
    if c == 0 : axs[c].set_title(f"Initial image")
    else: axs[c].set_title(f"Image 2 Image output")