Diffusion Model（3）：训练目标以及训练过程

MaZhe丶

已于 2022-10-24 21:23:12 修改

阅读量2.5k

点赞数 2

分类专栏：扩散模型深度学习文章标签： 1024程序员节人工智能深度学习

于 2022-10-24 21:20:14 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_42363544/article/details/127501196

版权

文章目录

观看本文之前建议先观看以下两篇文章：

Training Loss

训练目标

首先回顾一下我们的问题，我们在逆向降噪过程中由于没办法得到 $q(\mathbf{x}_{t-1} \vert \mathbf{x}_{t})$ ，因此我们定义了一个 需要学习的模型模型 $p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t)$ 来对其进行近似，并且在训练阶段我们可以利用后验 $q(\mathbf{x}_{t-1}\vert \mathbf{x}_t,\mathbf{x}_0)$ 来对 $p_\theta$ 进行优化。

那么现在的问题是我们如何 $p_\theta$ 优化得到理想的 $\boldsymbol{\mu}_\theta$ 和 $\boldsymbol{\Sigma}_\theta$ ？类似于 VAE ，我们可以最小化在真实数据期望下，模型预测分布的负对数似然，即最小化预测 $p_{\mathrm{data}}=q({\mathbf{x}_0})$ 和 $p_{\theta}(\mathbf{x}_0)$ 的交叉熵：
$\begin{equation} \mathcal{L}=\mathbb{E}_{\mathbf{x}_{0} \sim q\left(\mathbf{x}_{0}\right)}\left[-\log p_{\theta}\left(\mathbf{x}_{0}\right)\right] \end{equation}$
但是，我们没法得到 $p_\theta(\mathbf{x}_0)$ 的表达式，因此公式1的交叉熵是没法计算的。那么可以借助公式Diffusion Model（2）：前向扩散过程和逆向降噪过程
2-6 进行一些数学推导。将公式1中的 $p_\theta(\mathbf{x}_0)$ 转化为已知的项：
$\begin{equation} \begin{aligned} \mathcal{L} &=-\mathbb{E}_{q\left(\mathbf{x}_{0}\right)} \log p_{\theta}\left(\mathbf{x}_{0}\right) \\ &=-\mathbb{E}_{q\left(\mathbf{x}_{0}\right)} \log \left(\int p_{\theta}\left(\mathbf{x}_{0: T}\right) d \mathbf{x}_{1: T}\right) \\ &=-\mathbb{E}_{q\left(\mathbf{x}_{0}\right)} \log \left(\int q\left(\mathbf{x}_{1: T} \vert \mathbf{x}_{0}\right) \frac{p_{\theta}\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \vert \mathbf{x}_{0}\right)} d \mathbf{x}_{1: T}\right) \\ &=-\mathbb{E}_{q\left(\mathbf{x}_{0}\right)} \log \left(\mathbb{E}_{q\left(\mathbf{x}_{1: T} \vert \mathbf{x}_{0}\right)} \frac{p_{\theta}\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \vert \mathbf{x}_{0}\right)}\right) \\ & \leq-\mathbb{E}_{q\left(\mathbf{x}_{0: T}\right)} \log \frac{p_{\theta}\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \vert \mathbf{x}_{0}\right)} \\ &=\mathbb{E}_{q\left(\mathbf{x}_{0: T}\right)}\left[\log \frac{q\left(\mathbf{x}_{1: T} \vert \mathbf{x}_{0}\right)}{p_{\theta}\left(\mathbf{x}_{0: T}\right)}\right]=\mathcal{L}_{\mathrm{VLB}} \end{aligned} \end{equation}$
上式中 $q(\mathbf{x}_0)$ 是真实的数据分布，而 $p_\theta(\mathbf{x}_0)$ 是模型，从第四行到第五行使用了Jensen不等式 $\log \mathbb{E}[f(x)] \leq \mathbb{E}[\log f(x)]$ 并结合了对 $q(\mathbf{x}_0)$ 的期望和对 $q(\mathbf{x}_{1:T} \vert \mathbf{x}_0)$ 的期望。