文本到图像的高级应用

原创于 2025-12-06 00:05:16 发布 · 653 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #计算机视觉 #文本到图像的高级应用

摘要

本文深入探讨了Stable Diffusion WebUI中文本到图像生成的高级应用技术。我们将从提示词工程、条件控制、多概念融合等方面详细解析如何充分利用WebUI的各项功能，创作出高质量且富有创意的AI艺术作品。

引言

文本到图像生成是Stable Diffusion最核心的功能之一，它能够将自然语言描述转化为对应的视觉内容。然而，要充分发挥这一功能的潜力，仅仅输入简单的文本描述是远远不够的。本文将介绍一系列高级应用技巧，帮助用户更好地掌控生成过程，实现更加精准和富有创意的图像生成。

一、提示词工程技术进阶

1.1 提示词权重控制

在基础的提示词工程中，我们已经了解到可以通过括号来增强或减弱关键词的重要性。WebUI支持多种权重控制语法：

圆括号增强：(keyword:1.2) 或简单使用 (keyword) 将关键词权重增加1.1倍
方括号减弱：[keyword] 将关键词权重减少到原来的1/1.1
数值权重：(keyword:2.0) 可以指定具体的权重值

在[modules/prompt_parser.py](file:///E:/project/stable-diffusion-webui/modules/prompt_parser.py)中，我们可以看到这些权重是如何被解析和应用的：

re_attention = re.compile(r"""
\\\(|
\\\)| 
\\\[|
\\]| 
\\\\|
\\|
\(|
\[|
:\s*([+-]?[.\d]+)\s*\)|
\)|
]|
[^\\()\[\]:]+|
:
""", re.X)

def parse_prompt_attention(text):
    """
    Parses a string with attention tokens and returns a list of pairs: text and its associated weight.
    Accepted tokens are:
      (abc) - increases attention to abc by a multiplier of 1.1
      (abc:3.12) - increases attention to abc by a multiplier of 3.12
      [abc] - decreases attention to abc by a multiplier of 1.1
      \( - literal character '('
      \[ - literal character '['
      \) - literal character ')'
      \] - literal character ']'
      \\ - literal character '\'
      anything else - just text
    """
    
    res = []
    round_brackets = []
    square_brackets = []

    round_bracket_multiplier = 1.1
    square_bracket_multiplier = 1 / 1.1

    def multiply_range(start_position, multiplier):
        for p in range(start_position, len(res)):
            res[p][1] *= multiplier

    for m in re_attention.finditer(text):
        text = m.group(0)
        weight = m.group(1)

        if text.startswith('\\'):
            res.append([text[1:], 1.0])
        elif text == '(':
            round_brackets.append(len(res))
        elif text == '[':
            square_brackets.append(len(res))
        elif weight is not None and round_brackets:
            multiply_range(round_brackets.pop(), float(weight))
        elif text == ')' and round_brackets:
            multiply_range(round_brackets.pop(), round_bracket_multiplier)
        elif text == ']' and square_brackets:
            multiply_range(square_brackets.pop(), square_bracket_multiplier)
        else:
            parts = re.split(re_break, text)
            for i, part in enumerate(parts):
                if i > 0:
                    res.append(["BREAK", -1])
                res.append([part, 1.0])

这段代码展示了提示词权重解析的具体实现，通过正则表达式识别不同的权重标记，并相应地调整关键词的权重值。

1.2 提示词分段与调度

WebUI支持在不同生成阶段使用不同提示词的技术，称为提示词调度(prompt scheduling)。其语法格式为：

[concept1:concept2:transition_point]

这表示在生成过程的前半段使用concept1，后半段使用concept2。transition_point可以是具体步数或相对比例。

例如：

[mountain:lake:0.25] 表示在25%的步数后从"mountain"切换到"lake"
[mountain:lake:20] 表示在第20步后从"mountain"切换到"lake"

在[prompt_parser.py](file:///E:/project/stable-diffusion-webui/modules/prompt_parser.py)中，我们可以看到调度功能的实现：

def get_learned_conditioning_prompt_schedules(prompts, base_steps, hires_steps=None, use_old_scheduling=False):
    """
    >>> g = lambda p: get_learned_conditioning_prompt_schedules([p], 10)[0]
    >>> g("test")
    [[10, 'test']]
    >>> g("a [b:3]")
    [[3, 'a '], [10, 'a b']]
    >>> g("a [b: 3]")
    [[3, 'a '], [10, 'a b']]
    >>> g("a [[[b]]:2]")
    [[2, 'a '], [10, 'a [[b]]']]
    >>> g("[(a:2):3]")
    [[3, ''], [10, '(a:2)']]
    >>> g("a [b : c : 1] d")
    [[1, 'a b  d'], [10, 'a  c  d']]
    >>> g("a[b:[c:d:2]:1]e")
    [[1, 'abe'], [2, 'ace'], [10, 'ade']]
    """

1.3 多提示词组合

WebUI支持使用"AND"关键字组合多个独立的提示词，每个提示词可以有自己的权重：

a beautiful landscape AND (sunset:1.2) AND mountains

这种技术可以让模型分别关注不同的概念，然后将它们融合在一起。

二、条件控制技术

2.1 CFG Scale参数调优

Classifier-Free Guidance (CFG) Scale是控制生成图像与提示词匹配程度的重要参数。较高的CFG值会使图像更严格地遵循提示词，但可能导致过度饱和或不自然的结果；较低的CFG值会产生更多创造性但可能偏离提示词的结果。

最佳实践：

对于写实风格：CFG Scale设为7-12
对于艺术风格：CFG Scale设为5-9
对于抽象概念：CFG Scale设为3-6

2.2 种子与变体控制

种子(Seed)参数控制随机数生成器的初始状态，相同种子和参数会生成相同的图像。通过调整变体种子(Variation seed)和变体强度(Variation strength)，可以在保持整体构图的同时引入细微变化。

2.3 高清修复技术

WebUI的高清修复(HiRes. fix)功能允许先生成低分辨率图像，再将其放大并在高分辨率下进行细节修复。这种方法可以有效避免直接生成高分辨率图像时可能出现的破碎或失真问题。

实现过程分为两个阶段：

第一阶段：生成低分辨率图像
第二阶段：将图像上采样并在潜在空间或像素空间进行细化

在[modules/processing.py](file:///E:/project/stable-diffusion-webui/modules/processing.py)中可以看到高清修复的实现：

def sample_hr_pass(self, samples, decoded_samples, seeds, subseeds, subseed_strength, prompts):
    if shared.state.interrupted:
        return samples

    self.is_hr_pass = True
    target_width = self.hr_upscale_to_x
    target_height = self.hr_upscale_to_y

    def save_intermediate(image, index):
        """保存高清修复前的图像"""
        if not self.save_samples() or not opts.save_images_before_highres_fix:
            return

        if not isinstance(image, Image.Image):
            image = sd_samplers.sample_to_image(image, index, approximation=0)

        info = create_infotext(self, self.all_prompts, self.all_seeds, self.all_subseeds, [], 
                              iteration=self.iteration, position_in_batch=index)
        images.save_image(image, self.outpath_samples, "", seeds[index], prompts[index], 
                         opts.samples_format, info=info, p=self, suffix="-before-highres-fix")

    # 创建高清修复采样器
    img2img_sampler_name = self.hr_sampler_name or self.sampler_name
    self.sampler = sd_samplers.create_sampler(img2img_sampler_name, self.sd_model)

    if self.latent_scale_mode is not None:
        # 潜在空间上采样
        for i in range(samples.shape[0]):
            save_intermediate(samples, i)

        samples = torch.nn.functional.interpolate(samples, 
                                                size=(target_height // opt_f, target_width // opt_f), 
                                                mode=self.latent_scale_mode["mode"], 
                                                antialias=self.latent_scale_mode["antialias"])

        # 设置图像条件
        if getattr(self, "inpainting_mask_weight", shared.opts.inpainting_mask_weight) < 1.0:
            image_conditioning = self.img2img_image_conditioning(
                decode_first_stage(self.sd_model, samples), samples)
        else:
            image_conditioning = self.txt2img_image_conditioning(samples)
    else:
        # 解码空间上采样
        lowres_samples = torch.clamp((decoded_samples + 1.0) / 2.0, min=0.0, max=1.0)

        batch_images = []
        for i, x_sample in enumerate(lowres_samples):
            x_sample = 255. * np.moveaxis(x_sample.cpu().numpy(), 0, 2)
            x_sample = x_sample.astype(np.uint8)
            image = Image.fromarray(x_sample)

            save_intermediate(image, i)

            # 上采样图像
            image = images.resize_image(0, image, target_width, target_height, 
                                      upscaler_name=self.hr_upscaler)
            image = np.array(image).astype(np.float32) / 255.0
            image = np.moveaxis(image, 2, 0)
            batch_images.append(image)

        decoded_samples = torch.from_numpy(np.array(batch_images))
        decoded_samples = decoded_samples.to(shared.device, dtype=devices.dtype_vae)

        if opts.sd_vae_encode_method != 'Full':
            self.extra_generation_params['VAE Encoder'] = opts.sd_vae_encode_method
        samples = images_tensor_to_samples(decoded_samples, 
                                          approximation_indexes.get(opts.sd_vae_encode_method))

        image_conditioning = self.img2img_image_conditioning(decoded_samples, samples)

    shared.state.nextjob()

    # 裁剪样本
    samples = samples[:, :, self.truncate_y//2:samples.shape[2]-(self.truncate_y+1)//2, 
                     self.truncate_x//2:samples.shape[3]-(self.truncate_x+1)//2]

    # 创建噪声
    self.rng = rng.ImageRNG(samples.shape[1:], self.seeds, subseeds=self.subseeds, 
                           subseed_strength=self.subseed_strength, 
                           seed_resize_from_h=self.seed_resize_from_h, 
                           seed_resize_from_w=self.seed_resize_from_w)
    noise = self.rng.next()

    # 激活高清修复额外网络
    if not self.disable_extra_networks:
        with devices.autocast():
            extra_networks.activate(self, self.hr_extra_network_data)

    # 计算高清修复条件
    with devices.autocast():
        self.calculate_hr_conds()

    # 应用Token Merging
    sd_models.apply_token_merging(self.sd_model, self.get_token_merging_ratio(for_hr=True))

    # 执行高清修复前的脚本
    if self.scripts is not None:
        self.scripts.before_hr(self)
        self.scripts.process_before_every_sampling(
            p=self,
            x=samples,
            noise=noise,
            c=self.hr_c,
            uc=self.hr_uc,
        )

    # 执行高清修复采样
    samples = self.sampler.sample_img2img(self, samples, noise, self.hr_c, self.hr_uc, 
                                         steps=self.hr_second_pass_steps or self.steps, 
                                         image_conditioning=image_conditioning)

    # 恢复Token Merging设置
    sd_models.apply_token_merging(self.sd_model, self.get_token_merging_ratio())

    self.sampler = None
    devices.torch_gc()

    # 解码最终样本
    decoded_samples = decode_latent_batch(self.sd_model, samples, 
                                         target_device=devices.cpu, 
                                         check_for_nans=True)

    self.is_hr_pass = False
    return decoded_samples

三、高级创作技巧

3.1 风格迁移与混合

通过合理运用提示词权重和调度技术，可以实现多种艺术风格的融合。例如：

(a beautiful portrait:1.2), (oil painting style:0.8), (modern digital art:0.5), masterpiece, high quality

这种方式可以让AI在生成过程中综合考虑多种风格特征，创造出独特的视觉效果。

3.2 场景构图控制

使用透视、镜头类型等关键词可以有效控制生成图像的构图：

镜头类型：wide angle lens, telephoto lens, macro lens, fisheye lens
透视控制：bird's eye view, worm's eye view, dutch angle, over-the-shoulder shot
构图原则：rule of thirds, golden ratio, symmetrical composition

3.3 色彩与光照控制

通过明确指定色彩方案和光照条件，可以精确控制图像的整体氛围：

色彩方案：warm color palette, cool color palette, monochromatic, complementary colors
光照类型：studio lighting, natural lighting, dramatic lighting, rim lighting, backlighting

四、负向提示词的运用

负向提示词(Negative prompt)用于排除不希望出现在图像中的元素。合理的负向提示词可以显著提升生成质量：

常用的负向提示词包括：

质量相关：low quality, blurry, pixelated, compressed
人体结构：deformed, disfigured, mutated hands, extra fingers, missing arms
构图问题：cropped, out of frame, poorly drawn face
不和谐元素：text, logo, watermark, signature

五、批处理与变量控制

5.1 批次生成

通过设置批次数量(Batch count)和批次大小(Batch size)，可以一次性生成多张图像。这对于探索不同创意方向或进行A/B测试非常有用。

5.2 种子变异

使用不同的种子值可以生成同一提示词下的多种变体，帮助找到最满意的结果。

5.3 X/Y/Z图表

WebUI的X/Y/Z图表功能可以系统性地比较不同参数组合的效果，例如比较不同CFG Scale和采样器的组合效果。

六、实际应用案例

6.1 角色设计

在角色设计中，可以使用如下提示词组合：

(character concept art:1.3), fantasy warrior, detailed armor, intricate design, 
(front view:1.1), (sharp focus:1.1), (highly detailed:1.2), digital painting, 
artstation, concept art, smooth, sharp focus, illustration, 
[extra limbs:-1.2], [deformed:-1.3], [distorted proportions:-1.2]

6.2 场景构建

在场景构建中，可以使用调度技术实现时间变化效果：

(a bustling medieval marketplace:1.2), [daytime:magic hour:0.6], 
crowded streets, merchants, colorful tents, cobblestone roads, 
(high detail:1.1), cinematic lighting, epic composition