图像修复与超分辨率技术详解

原创于 2025-12-06 00:04:58 发布 · 519 阅读

CC 4.0 BY-SA版权

文章标签：

摘要

本文详细介绍了Stable Diffusion WebUI中的图像修复(Inpainting)和超分辨率(Upscaling)技术。我们将深入探讨这些功能的技术实现原理、核心代码逻辑以及实际应用方法，帮助开发者更好地理解和利用这些强大的图像处理工具。

引言

Stable Diffusion WebUI不仅是一个强大的AI图像生成工具，还提供了丰富的后期处理功能，其中图像修复和超分辨率是最具实用价值的功能之一。图像修复能够智能地填补图片缺失部分或者移除不需要的元素，而超分辨率则可以将低分辨率图片放大至更高清晰度，同时保持良好的视觉质量。这两个功能在实际应用中非常广泛，例如照片修复、艺术创作、游戏素材制作等领域。

一、图像修复技术详解

1.1 技术原理概述

图像修复技术基于扩散模型的强大生成能力，通过结合原始图像的上下文信息和用户提供的遮罩(Mask)，模型能够在指定区域内生成符合整体语境的新内容。在Stable Diffusion WebUI中，这一过程涉及以下几个关键步骤：

遮罩处理：用户通过绘制遮罩标记需要修复的区域
条件编码：将遮罩信息融入到图像条件编码中
噪声预测：模型预测遮罩区域的噪声分布
内容生成：通过反向扩散过程生成新的像素内容
无缝融合：确保新生成内容与原图自然衔接

1.2 核心实现机制

1.2.1 遮罩处理流程

在[modules/processing.py](file:///e:/project/stable-diffusion-webui/modules/processing.py)中，我们可以看到图像修复的关键处理逻辑：

def inpainting_image_conditioning(self, source_image, latent_image, image_mask=None, round_image_mask=True):
    self.is_using_inpainting_conditioning = True

    # 处理遮罩输入
    if image_mask is not None:
        if torch.is_tensor(image_mask):
            conditioning_mask = image_mask
        else:
            conditioning_mask = np.array(image_mask.convert("L"))
            conditioning_mask = conditioning_mask.astype(np.float32) / 255.0
            conditioning_mask = torch.from_numpy(conditioning_mask[None, None])

            if round_image_mask:
                # 将遮罩离散化为1.0或0.0
                conditioning_mask = torch.round(conditioning_mask)
    else:
        conditioning_mask = source_image.new_ones(1, 1, *source_image.shape[-2:])

    # 创建条件图像，遮罩区域置零
    conditioning_mask = conditioning_mask.to(device=source_image.device, dtype=source_image.dtype)
    conditioning_image = torch.lerp(
        source_image,
        source_image * (1.0 - conditioning_mask),
        getattr(self, "inpainting_mask_weight", shared.opts.inpainting_mask_weight)
    )

    # 编码遮罩后的图像
    conditioning_image = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(conditioning_image))

    # 构建最终的条件张量
    conditioning_mask = torch.nn.functional.interpolate(conditioning_mask, size=latent_image.shape[-2:])
    conditioning_mask = conditioning_mask.expand(conditioning_image.shape[0], -1, -1, -1)
    image_conditioning = torch.cat([conditioning_mask, conditioning_image], dim=1)
    image_conditioning = image_conditioning.to(shared.device).type(self.sd_model.dtype)

    return image_conditioning

这段代码展示了遮罩如何被处理并与原始图像结合形成模型所需的条件输入。遮罩不仅决定了哪些区域需要重绘，还影响了模型对周围上下文的关注程度。

1.2.2 遮罩模糊与填充策略

为了获得更好的修复效果，WebUI还实现了遮罩模糊和填充策略：

if self.mask_blur_x > 0:
    np_mask = np.array(image_mask)
    kernel_size = 2 * int(2.5 * self.mask_blur_x + 0.5) + 1
    np_mask = cv2.GaussianBlur(np_mask, (kernel_size, 1), self.mask_blur_x)
    image_mask = Image.fromarray(np_mask)

if self.mask_blur_y > 0:
    np_mask = np.array(image_mask)
    kernel_size = 2 * int(2.5 * self.mask_blur_y + 0.5) + 1
    np_mask = cv2.GaussianBlur(np_mask, (1, kernel_size), self.mask_blur_y)
    image_mask = Image.fromarray(np_mask)

通过高斯模糊处理遮罩边缘，可以避免修复区域与原图之间出现明显的边界线，使合成结果更加自然。

1.3 高级修复模式

WebUI支持多种修复模式，包括全分辨率修复和仅遮罩区域修复：

if self.inpaint_full_res:
    self.mask_for_overlay = image_mask
    mask = image_mask.convert('L')
    crop_region = masking.get_crop_region_v2(mask, self.inpaint_full_res_padding)
    if crop_region:
        crop_region = masking.expand_crop_region(crop_region, self.width, self.height, mask.width, mask.height)
        x1, y1, x2, y2 = crop_region
        mask = mask.crop(crop_region)
        image_mask = images.resize_image(2, mask, self.width, self.height)
        self.paste_to = (x1, y1, x2-x1, y2-y1)
        self.extra_generation_params["Inpaint area"] = "Only masked"
        self.extra_generation_params["Masked area padding"] = self.inpaint_full_res_padding

这种设计使得用户可以选择在全图像分辨率下进行修复（保证细节质量）还是仅在遮罩区域进行修复（提高计算效率）。

二、超分辨率技术详解

2.1 技术原理概述

超分辨率技术旨在将低分辨率图像转换为高分辨率版本，同时尽可能恢复丢失的细节。WebUI支持多种上采样算法，包括传统的插值方法和基于神经网络的先进模型。

2.2 核心实现机制

在[scripts/postprocessing_upscale.py](file:///e:/project/stable-diffusion-webui/scripts/postprocessing_upscale.py)中，我们可以看到超分辨率的主要实现：

def upscale(self, image, info, upscaler, upscale_mode, upscale_by, max_side_length, upscale_to_width, upscale_to_height, upscale_crop):
    if upscale_mode == 1:
        upscale_by = max(upscale_to_width/image.width, upscale_to_height/image.height)
        info["Postprocess upscale to"] = f"{upscale_to_width}x{upscale_to_height}"
    else:
        info["Postprocess upscale by"] = upscale_by
        if max_side_length != 0 and max(*image.size)*upscale_by > max_side_length:
            upscale_mode = 1
            upscale_crop = False
            upscale_to_width, upscale_to_height = limit_size_by_one_dimention(image.width*upscale_by, image.height*upscale_by, max_side_length)
            upscale_by = max(upscale_to_width/image.width, upscale_to_height/image.height)
            info["Max side length"] = max_side_length

    cache_key = (hash(np.array(image.getdata()).tobytes()), upscaler.name, upscale_mode, upscale_by,  upscale_to_width, upscale_to_height, upscale_crop)
    cached_image = upscale_cache.pop(cache_key, None)

    if cached_image is not None:
        image = cached_image
    else:
        image = upscaler.scaler.upscale(image, upscale_by, upscaler.data_path)

    upscale_cache[cache_key] = image
    if len(upscale_cache) > shared.opts.upscaling_max_images_in_cache:
        upscale_cache.pop(next(iter(upscale_cache), None), None)

    if upscale_mode == 1 and upscale_crop:
        cropped = Image.new("RGB", (upscale_to_width, upscale_to_height))
        cropped.paste(image, box=(upscale_to_width // 2 - image.width // 2, upscale_to_height // 2 - image.height // 2))
        image = cropped
        info["Postprocess crop to"] = f"{image.width}x{image.height}"

    return image

该函数展示了完整的上采样流程，包括尺寸计算、缓存管理和裁剪处理。

2.3 多种上采样器支持

WebUI支持多种上采样器，包括：

传统算法：Lanczos、Bilinear、Bicubic等
神经网络模型：ESRGAN、SwinIR、LDSR等

在[modules/upscaler.py](file:///e:/project/stable-diffusion-webui/modules/upscaler.py)中定义了上采样器的基类：

def upscale(self, img: PIL.Image, scale, selected_model: str = None):
    self.scale = scale
    dest_w = int((img.width * scale) // 8 * 8)
    dest_h = int((img.height * scale) // 8 * 8)

    for i in range(3):
        if img.width >= dest_w and img.height >= dest_h and (i > 0 or scale != 1):
            break

        if shared.state.interrupted:
            break

        shape = (img.width, img.height)

        img = self.do_upscale(img, selected_model)

        if shape == (img.width, img.height):
            break

    if img.width != dest_w or img.height != dest_h:
        img = img.resize((int(dest_w), int(dest_h)), resample=LANCZOS)

    return img

这个通用框架允许不同的上采样器实现各自的[do_upscale](file:///e:/project/stable-diffusion-webui/extensions-builtin/SwinIR/swinir_model.py#L39-L71)方法，同时保持统一的接口。

三、实际应用场景与最佳实践

3.1 图像修复应用场景

老照片修复：去除划痕、污渍，恢复历史照片
对象移除：移除照片中不想要的对象或人物
艺术创作：扩展画布，在空白区域生成协调的内容
构图优化：调整画面元素位置和比例

3.2 超分辨率应用场景

低分辨率图像增强：提升小尺寸图片的显示质量
打印输出准备：将网络图片放大至可打印尺寸
视频帧增强：提升视频序列中单帧的质量
纹理生成：为游戏和CG制作高质量纹理贴图

3.3 最佳实践建议

遮罩绘制技巧：
- 使用柔边画笔绘制遮罩，获得更自然的过渡效果
- 对于精细修复，适当增加遮罩区域以提供更多上下文信息
- 利用遮罩模糊参数平滑边缘
参数调节建议：
- 降低去噪强度(Denoising Strength)可以获得更接近原图的效果
- 合理设置提示词，引导模型生成符合预期的内容
- 使用适当的采样步数平衡质量和速度
上采样策略：
- 对于小幅上采样(2倍以内)，传统算法可能已足够
- 对于大幅上采样，推荐使用ESRGAN等深度学习模型
- 可组合使用多个上采样器，获得更好的效果