KOALA: 快速高效的文本到图像合成模型教程

祖筱泳

于 2024-10-11 07:27:51 发布

阅读量745

点赞数 10

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_01111/article/details/142839375

KOALA: 快速高效的文本到图像合成模型教程

sdxl-koala Compressing SDXL via knowledge-distillation 项目地址: https://gitcode.com/gh_mirrors/sd/sdxl-koala

1. 项目介绍

KOALA（Knowledge-distilled Optimized ALgorithm）是一个由Youngwan Lee等人开发的文本到图像合成模型，旨在通过知识蒸馏和模型压缩技术，实现高效的文本到图像生成。该项目在NeurIPS 2024上发表，主要目标是减少推理成本，同时保持生成图像的质量。

KOALA模型通过压缩Stable Diffusion XL（SDXL）的U-Net结构，并从SDXL中提取知识，构建了一个高效的文本到图像生成模型。KOALA-Lightning-700M能够在NVIDIA 4090 GPU上以0.66秒的速度生成1024x1024分辨率的图像，比SDXL快4倍以上。

2. 项目快速启动

安装依赖

首先，确保你已经安装了Python和pip。然后，通过以下命令安装必要的库：

pip install -U diffusers transformers accelerate safetensors

加载模型

使用以下代码加载KOALA-Lightning-700M模型并生成图像：

import torch
from diffusers import StableDiffusionXLPipeline, EulerDiscreteScheduler

# 加载模型
pipe = StableDiffusionXLPipeline.from_pretrained("etri-vilab/koala-lightning-700m", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# 配置采样器
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")

# 定义提示词
prompt = "Albert Einstein in a surrealist Cyberpunk 2077 world, hyperrealistic"
negative_prompt = '(deformed iris, deformed pupils, deformed nose, deformed mouse), worst quality, low quality, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs'

# 生成图像
image = pipe(prompt=prompt, negative_prompt=negative_prompt, guidance_scale=3.5, num_inference_steps=10).images[0]

# 保存图像
image.save("example.png")