stable diffusion中的 0.18215

在stable diffusion v1.5中,初始化latent的时候经常要乘以0.18215,而在decoder之前,又要除以0.18215。那么0.18215是怎么来的?
在stable diffusion XL中,这个值为0.13025.
Explanation of the 0.18215 factor in textual_inversion? #437中各位大佬回答了这个问题,特记录一下。

We introduced the scale factor in the latent diffusion paper. The goal was to handle different latent spaces (from different autoencoders, which can be scaled quite differently than images) with similar noise schedules. The scale_factor ensures that the initial latent space on which the diffusion model is operating has approximately unit variance. Hope this helps 😃

不同的自编码器在将图像编码到潜在空间时,可能会产生不同的缩放和分布特性。为了使扩散模型能够在一个相对统一和稳定的潜在空间中进行操作,需要对潜在空间进行标准化处理。缩放因子确保了不同的模型有近似的单位方差。

To make sure I’m understanding, it sounds like you arrived at scale_factor = 0.18215 by averaging over a bunch of examples generated by the vae, in order to ensure they have unit variance with the variance taken over all dimensions simultaneously? And scale_factor = 1 / std(z)

计算方法就是将一些样本通过VAE encode到 latent,标准化latent,除以标准差,缩放因子就是 1 / std(z),即0.18215。

Fernando Pérez-García给了代码解释

from diffusers import AutoencoderKL
import torch
import torchvision
from torchvision.datasets.utils import download_and_extract_archive
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值