0、快速访问
论文阅读笔记:Denoising Diffusion Implicit Models (1)
论文阅读笔记:Denoising Diffusion Implicit Models (2)
论文阅读笔记:Denoising Diffusion Implicit Models (3)
论文阅读笔记:Denoising Diffusion Implicit Models (4)
3、非马尔可夫正向加噪过程
与DDPM中的正向加噪过程不同,DDIM的加噪过程是非马尔可夫的,按照论文中的表述,如公式(1) 和(2)所示。
q σ ( x 1 : T ∣ x 0 ) : = q σ ( x T ∣ x 0 ) ∏ t = 2 T q σ ( x t − 1 ∣ x t , x 0 ) \begin{equation} \begin{split} q_{\sigma}(x_{1:T}|x_0):&=q_{\sigma}(x_T|x_0)\prod_{t=2}^{T}q_{\sigma}(x_{t-1}|x_t,x_0) \end{split} \end{equation} qσ(x1:T∣x0):=qσ(xT∣x0)t=2∏Tqσ(xt−1∣xt,x0)
式中
q σ ( x T ∣ x 0 ) = N ( x T ; α T x 0 , ( 1 − α T ) I ) ⇔ x T = α T ⋅ x 0 + 1 − α T ⋅ z ( z 为标准正态分布 ) q σ ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ; 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ⏟ 令其等于 = μ ( x t , x 0 ) , σ t 2 I ) = N ( x t − 1 ; μ ( x t , x 0 ) , σ t 2 I ) \begin{equation} \begin{split} q_{\sigma}(x_T|x_0)&=N(x_T;\sqrt{\alpha_T}x_0,(1-\alpha_T)I)\Leftrightarrow x_T=\sqrt{\alpha_T}\cdot x_0+\sqrt{1-\alpha_T}\cdot z(z为标准正态分布) \\ q_{\sigma}(x_{t-1}|x_t,x_0)&=N\Bigg(x_{t-1};\underbrace{\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0}_{令其等于=\mu(x_t,x_0)} ,\sigma_t^2 I\Bigg) \\ &=N\Bigg(x_{t-1};\mu(x_t,x_0) ,\sigma_t^2 I\Bigg) \\ \end{split} \end{equation} qσ(xT∣x0)qσ(xt−1∣xt,x0)=N(xT;αTx0,(1−αT)I)⇔xT=αT⋅x0+1−αT⋅z(z为标准正态分布)=N(xt−1;令其等于=μ(xt,x0) 1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0,σt2I)=N(xt−1;μ(xt,x0),σt2I)
下图展示了这个加噪过程
对于这个采样过程,首先证明以下引理:
Lemma 1: q σ ( x t ∣ x 0 ) = N ( x t ; α t x 0 , ( 1 − α t ) I ) ⇔ x t = α t ⋅ x 0 + 1 − α t ⋅ z \begin{equation} \begin{split} \text{Lemma 1}:q_{\sigma}(x_t|x_0)&=N(x_t;\sqrt{\alpha_t} x_0,(1-\alpha_t)I) \\ \Leftrightarrow x_t&=\sqrt{\alpha_t}\cdot x_0+\sqrt{1-\alpha_t}\cdot z \\ \end{split} \end{equation} Lemma 1:qσ(xt∣x0)⇔xt=N(xt;αtx0,(1−αt)I)=αt⋅x0+1−αt⋅z
使用数学归纳法证明Lemma 1,方法分为3步,如所示
- 当 t = T t=T t=T时, t = T t=T t=T时, q σ ( x T ∣ x 0 ) q_{\sigma}(x_T|x_0) qσ(xT∣x0)满足 x T = α T ⋅ x 0 + 1 − α T ⋅ z x_T=\sqrt{\alpha_T}\cdot x_0+\sqrt{1-\alpha_T}\cdot z xT=αT⋅x0+1−αT⋅z,符合Lemma 1。
- 假设 t = t t=t t=t时 q σ ( x t ∣ x 0 ) q_{\sigma}(x_t|x_0) qσ(xt∣x0)满足Lemma 1,即 q σ ( x t ∣ x 0 ) = N ( x t ; α t x 0 , ( 1 − α t ) ) q_{\sigma}(x_t|x_0)=N\big(x_t;\sqrt{\alpha_t} x_0,(1-\alpha_t)\big) qσ(xt∣x0)=N(xt;αtx0,(1−αt))。
- 这一步需要证明:当 t = t − 1 t=t-1 t=t−1时,由于 q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt−1∣x0)也满足Lemma 1。这个证明过程有两种方法。
方法1:
q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt−1∣x0)是 q σ ( x t − 1 , x t ∣ x 0 ) q_{\sigma}(x_{t-1},x_t|x_0) qσ(xt−1,xt∣x0)的边缘分布,因此 q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt−1∣x0)满足公式(4)。
q σ ( x t − 1 ∣ x 0 ) = ∫ q σ ( x t − 1 , x t ∣ x 0 ) ⋅ d x t = ∫ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t \begin{equation} \begin{split} q_{\sigma}(x_{t-1}|x_0)&= \int q_{\sigma}(x_{t-1},x_t|x_0) \cdot d{x_t}\\ &=\int q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \end{split} \end{equation} qσ(xt−1∣x0)=∫qσ(xt−1,xt∣x0)⋅dxt=∫qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt
q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt−1∣x0)表示:在给定 x 0 x_0 x0的条件下, x t − 1 x_{t-1} xt−1的分布。 x t − 1 x_{t-1} xt−1是一个高斯分布,并且设其均值和方差分别为 μ \mu μ和 σ \sigma σ,其计算过程分别如公式(5)和公式(6)所示。
μ = E ( q σ ( x t − 1 ∣ x 0 ) ) = ∫ x t − 1 ⋅ q σ ( x t − 1 ∣ x 0 ) ⋅ d x t − 1 = ∫ x t − 1 ⋅ ( ∫ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ) ⋅ d x t − 1 = ∬ x t − 1 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t d x t − 1 = ∫ ( ∫ x t − 1 ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t − 1 ) q σ ( x t ∣ x 0 ) ⋅ d x t = ∫ μ ( x t , x 0 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t = E x t ∼ q σ ( x t ∣ x 0 ) ( μ ( x t , x 0 ) ) = E x t ∼ q σ ( x t ∣ x 0 ) ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ) = 1 − α t − 1 − σ t 2 1 − α t ⋅ E x t ∼ q σ ( x t ∣ x 0 ) ( x t ) ⏟ = α t ⋅ x 0 + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x 0 + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 = α t − 1 ⋅ x 0 \begin{equation} \begin{split} \mu&=E\big(q_{\sigma}(x_{t-1}|x_0)\big)\\ &=\int x_{t-1}\cdot q_{\sigma}(x_{t-1}|x_0) \cdot dx_{t-1} \\ &=\int x_{t-1} \cdot \Bigg( \int q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t}\Bigg)\cdot dx_{t-1} \\ &= \iint x_{t-1}\cdot q_\sigma(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_t dx_{t-1} \\ &=\int \Big(\int x_{t-1}\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_{t-1}\Big) q_\sigma(x_t|x_0)\cdot dx_t \\ &=\int \mu(x_t,x_0)\cdot q_\sigma(x_t|x_0)\cdot dx_t \\ &=E_{x_t\sim q_{\sigma}(x_t|x_0)}\Big(\mu(x_t,x_0)\Big)\\ &=E_{x_t\sim q_{\sigma}(x_t|x_0)}\Bigg(\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0\Bigg) \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot \underbrace{E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t)}_{=\sqrt{\alpha_t}\cdot x_0}+\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_0+\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ &=\sqrt{\alpha_{t-1}}\cdot x_0 \end{split} \end{equation} μ=E(qσ(xt−1∣x0))=∫xt−1⋅qσ(xt−1∣x0)⋅dxt−1=∫xt−1⋅(∫qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt)⋅dxt−1=∬xt−1⋅qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxtdxt−1=∫(∫xt−1⋅qσ(xt−1∣xt,x0)⋅dxt−1)qσ(xt∣x0)⋅dxt=∫μ(xt,x0)⋅qσ(xt∣x0)⋅dxt=Ext∼qσ(xt∣x0)(μ(xt,x0))=Ext∼qσ(xt∣x0)(1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0)=1−αt1−αt−1−σt2⋅=αt⋅x0 Ext∼qσ(xt∣x0)(xt)+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0=1−αt1−αt−1−σt2⋅x0+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0=αt−1⋅x0
σ 2 = V a r ( q σ ( x t − 1 ∣ x 0 ) ) = ∫ ( x t − 1 − μ ) 2 ⋅ q σ ( x t − 1 ∣ x 0 ) ⋅ d x t − 1 = ∫ ( x t − 1 2 − 2 μ ⋅ x t − 1 + μ 2 ) ⋅ ( ∫ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ) ⋅ d x t − 1 = ∫ ∫ ( x t − 1 2 − 2 μ ⋅ x t − 1 + μ 2 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 = ∬ x t − 1 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 − ∬ 2 μ ⋅ x t − 1 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 + ∬ μ 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 = ∫ ( ∫ x t − 1 2 ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t − 1 ) ⏟ = E ( x t − 1 2 ) = μ ( x t , x 0 ) 2 + σ t 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t − 2 ⋅ μ ∫ ( ∫ x t − 1 ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t − 1 ) ⏟ = E ( x t − 1 ) = μ ( x t , x 0 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + μ 2 ⋅ ∬ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 ⏟ = 1 = ∫ ( μ ( x t , x 0 ) 2 + σ t 2 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t − 2 ⋅ μ ∫ μ ( x t , x 0 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = μ + μ 2 = ∫ μ ( x t , x 0 ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 ⋅ ∫ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = 1 − 2 ⋅ μ 2 + μ 2 = ∫ ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ⏟ 为定值,设为 A ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 − μ 2 = ∫ ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + A ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 − μ 2 = ∫ ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t 2 + 2 ⋅ A ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + A 2 ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t 2 ) + 2 ⋅ A ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t ) + A 2 ⋅ ∫ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = 1 + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t 2 ) = α t x 0 2 + ( 1 − α t ) + 2 ⋅ A ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t ) = α t ⋅ x 0 + A 2 ⋅ ∫ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = 1 + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) ] + 2 ⋅ ( [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ) ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ α t ⋅ x 0 + ( [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ) 2 + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) ] + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t − 2 ⋅ α t ⋅ x 0 2 ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) − 2 ⋅ α t ⋅ x 0 2 + α t ⋅ x 0 2 ] + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) − 2 ⋅ α t ⋅ x 0 2 + α t ⋅ x 0 2 ] + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 − μ 2 ⏟ = α t − 1 ⋅ x 0 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) + σ t 2 = 1 − α t − 1 − σ t 2 + σ t 2 = 1 − α t − 1 \begin{equation} \begin{split} \sigma^2&=Var\big(q_{\sigma}(x_{t-1}|x_0)\big)\\ &=\int (x_{t-1}-\mu)^2\cdot q_{\sigma}(x_{t-1}|x_0)\cdot dx_{t-1} \\ &=\int (x_{t-1}^2-2\mu\cdot x_{t-1}+\mu^2)\cdot \Big(\int q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \Big)\cdot dx_{t-1} \\ &=\int \int (x_{t-1}^2-2\mu\cdot x_{t-1}+\mu^2)\cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1} \\ &=\iint x_{t-1}^2\cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1} -\iint 2\mu\cdot x_{t-1} \cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1}+ \iint \mu^2 \cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1} \\ &=\int \underbrace{\Bigg(\int x_{t-1}^2 \cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_{t-1} \Bigg)}_{=E(x_{t-1}^2)=\mu(x_t,x_0)^2+\sigma_t^2} \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} -2 \cdot \mu \int \underbrace{\Bigg(\int x_{t-1} \cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_{t-1} \Bigg)}_{=E(x_{t-1})=\mu(x_t,x_0)} \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \mu^2 \cdot \underbrace{\iint q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1}}_{=1} \\ &=\int \bigg(\mu(x_t,x_0)^2+\sigma_t^2\bigg) \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} -2 \cdot \mu \underbrace{ \int \mu(x_t,x_0) \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=\mu} + \mu^2 \\ &=\int \mu(x_t,x_0)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 \cdot \underbrace{ \int q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=1} -2 \cdot \mu ^2 + \mu^2\\ &=\int\Bigg(\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \underbrace{\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0}_{为定值,设为A} \Bigg)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 - \mu ^2 \\ &=\int\Bigg(\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ A \Bigg)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 - \mu ^2 \\ &=\int\Bigg(\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}\cdot x_t^2+2\cdot A \cdot \sqrt{ \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ A^2 \Bigg)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \underbrace{\int x_t^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t^2)}+2\cdot A\cdot \sqrt {\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} }\cdot \underbrace{\int x_t\cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t)}+A^2 \cdot \underbrace{ \int q_{\sigma}(x_t|x_0)\cdot d{x_t} }_{=1}+ \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \underbrace{\int x_t^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t^2)=\alpha_t x_0^2+(1-\alpha_t)}+2\cdot A\cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \underbrace{\int x_t\cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t)=\sqrt{\alpha_t}\cdot x_0}+A^2 \cdot \underbrace{ \int q_{\sigma}(x_t|x_0)\cdot d{x_t} }_{=1}+ \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\alpha_t x_0^2+(1-\alpha_t)\bigg]+2\cdot\Bigg(\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0\Bigg) \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \sqrt{\alpha_t}\cdot x_0+\Bigg(\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0\Bigg)^2 + \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\alpha_t x_0^2+(1-\alpha_t)\bigg]+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} - \frac{2\cdot \alpha_t \cdot x_0^2 \cdot (1-\alpha_{t-1}-\sigma_t^2)}{1-\alpha_t}+x_0^2\cdot \alpha_{t-1}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \frac{x_0^2 \cdot \alpha_t \cdot (1-\alpha_{t-1}-\sigma_t^2)}{1-\alpha_t}+ \sigma_t^2 - \mu ^2\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\alpha_t x_0^2+(1-\alpha_t) -2 \cdot \alpha_t\cdot x_0^2 +\alpha_t \cdot x_0^2\bigg]+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} +x_0^2\cdot \alpha_{t-1}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \sigma_t^2 - \mu ^2\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\bcancel{\alpha_t x_0^2}+(1-\alpha_t) -\bcancel{2 \cdot \alpha_t\cdot x_0^2} +\bcancel{\alpha_t \cdot x_0^2}\bigg]+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} +x_0^2\cdot \alpha_{t-1}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \sigma_t^2 - \mu ^2\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot (1-\alpha_t)+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} +\bcancel{x_0^2\cdot \alpha_{t-1}}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \sigma_t^2 - \underbrace{ \bcancel{\mu ^2}}_{=\alpha_{t-1}\cdot x_0^2}\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot (1-\alpha_t)+\bcancel {2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}} -\bcancel{\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}} }+ \sigma_t^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot (1-\alpha_t)+ \sigma_t^2 \\ &=1-\alpha_{t-1}-\sigma_t^2 + \sigma_t^2 \\ &=1-\alpha_{t-1} \end{split} \end{equation} σ2=Var(qσ(xt−1∣x0))=∫(xt−1−μ)2⋅qσ(xt−1∣x0)⋅dxt−1=∫(xt−12−2μ⋅xt−1+μ2)⋅(∫qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt)⋅dxt−1=∫∫(xt−12−2μ⋅xt−1+μ2)⋅qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt⋅dxt−1=∬xt−12⋅qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt⋅dxt−1−∬2μ⋅xt−1⋅qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt⋅dxt−1+∬μ2⋅qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt⋅dxt−1=∫=E(xt−12)=μ(xt,x0)2+σt2 (∫xt−12⋅qσ(xt−1∣xt,x0)⋅dxt−1)⋅qσ(xt∣x0)⋅dxt−2⋅μ∫=E(xt−1)=μ(xt,x0) (∫xt−1⋅qσ(xt−1∣xt,x0)⋅dxt−1)⋅qσ(xt∣x0)⋅dxt+μ2⋅=1 ∬qσ(xt∣x0)⋅qσ(xt−1∣xt,x0)⋅dxt⋅dxt−1=∫(μ(xt,x0)2+σt2)⋅qσ(xt∣x0)⋅dxt−2⋅μ=μ ∫μ(xt,x0)⋅qσ(xt∣x0)⋅dxt+μ2=∫μ(xt,x0)2⋅qσ(xt∣x0)⋅dxt+σt2⋅=1 ∫qσ(xt∣x0)⋅dxt−2⋅μ2+μ2=∫(1−αt1−αt−1−σt2⋅xt+为定值,设为A [αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0)2⋅qσ(xt∣x0)⋅dxt+σt2−μ2=∫(1−αt1−αt−1−σt2⋅xt+A)2⋅qσ(xt∣x0)⋅dxt+σt2−μ2=∫(1−αt1−αt−1−σt2⋅xt2+2⋅A⋅1−αt1−αt−1−σt2⋅xt+A2)2⋅qσ(xt∣x0)⋅dxt+σt2−μ2=1−αt1−αt−1−σt2⋅=Ext∼qσ(xt∣x0)(xt2) ∫xt2⋅qσ(xt∣x0)⋅dxt+2⋅A⋅1−αt1−αt−1−σt2⋅=Ext∼qσ(xt∣x0)(xt) ∫xt⋅qσ(xt∣x0)⋅dxt+A2⋅=1 ∫qσ(xt∣x0)⋅dxt+σt2−μ2=1−αt1−αt−1−σt2⋅=Ext∼qσ(xt∣x0)(xt2)=αtx02+(1−αt) ∫xt2⋅qσ(xt∣x0)⋅dxt+2⋅A⋅1−αt1−αt−1−σt2⋅=Ext∼qσ(xt∣x0)(xt)=αt⋅x0 ∫xt⋅qσ(xt∣x0)⋅dxt+A2⋅=1 ∫qσ(xt∣x0)⋅dxt+σt2−μ2=1−αt1−αt−1−σt2⋅[αtx02+(1−αt)]+2⋅([αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0)⋅1−αt1−αt−1−σt2⋅αt⋅x0+([αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0)2+σt2−μ2=1−αt1−αt−1−σt2⋅[αtx02+(1−αt)]+2⋅αt−1⋅x02αt⋅1−αt1−αt−1−σt2−1−αt2⋅αt⋅x02⋅(1−αt−1−σt2)+x02⋅αt−1−1−αt2⋅x02⋅αt⋅αt−1⋅1−αt−1−σt2+1−αtx02⋅αt⋅(1−αt−1−σt2)+σt2−μ2=1−αt1−αt−1−σt2⋅[αtx02+(1−αt)−2⋅αt⋅x02+αt⋅x02]+2⋅αt−1⋅x02αt⋅1−αt1−αt−1−σt2+x02⋅αt−1−1−αt2⋅x02⋅αt⋅αt−1⋅1−αt−1−σt2+σt2−μ2=1−αt1−αt−1−σt2⋅[αtx02 +(1−αt)−2⋅αt⋅x02 +αt⋅x02 ]+2⋅αt−1⋅x02αt⋅1−αt1−αt−1−σt2+x02⋅αt−1−1−αt2⋅x02⋅αt⋅αt−1⋅1−αt−1−σt2+σt2−μ2=1−αt1−αt−1−σt2⋅(1−αt)+2⋅αt−1⋅x02αt⋅1−αt1−αt−1−σt2+x02⋅αt−1 −1−αt2⋅x02⋅αt⋅αt−1⋅1−αt−1−σt2+σt2−=αt−1⋅x02 μ2 =1−αt1−αt−1−σt2⋅(1−αt)+2⋅αt−1⋅x02αt⋅1−αt1−αt−1−σt2 −1−αt2⋅x02⋅αt⋅αt−1⋅1−αt−1−σt2 +σt2=1−αt1−αt−1−σt2⋅(1−αt)+σt2=1−αt−1−σt2+σt2=1−αt−1
由公式(5)和公式(6)可以得出公式(7)所示结论,Lemma 1得到证明。
q σ ( x t − 1 ∣ x 0 ) = N ( x t − 1 ; α t − 1 ⋅ x 0 , ( 1 − α t − 1 ) I ) \begin{equation} \begin{split} q_{\sigma}(x_{t-1}|x_0)=N(x_{t-1};\sqrt{\alpha_{t-1}}\cdot x_0,(1-\alpha_{t-1})I) \end{split} \end{equation} qσ(xt−1∣x0)=N(xt−1;αt−1⋅x0,(1−αt−1)I)
方法2:
这个证明过程就是论文中的证明过程 ,该过程引用了 《Pattern Recognition and Machine Learning》一书中93页的公式(2.113)、(2.114)、(2.115),公式内容如下图所示。
q σ ( x t ∣ x 0 ) = N ( x t ; α t ⋅ x 0 , ( 1 − α t ) I ) ⇕ p ( x ) = N ( x ∣ μ , Λ − 1 ) ( 2.113 ) q σ ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ; 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 , σ t 2 I ) ⇕ p ( y ∣ x ) = N ( y ∣ A x + b , L − 1 ) ( 2.114 ) q σ ( x t − 1 ∣ x 0 ) ⇔ p ( y ) = N ( y ∣ A μ + b , L − 1 + A Λ − 1 A T ) \begin{equation} \begin{split} q_{\sigma}(x_{t}|x_0)&=N(x_{t};\sqrt{\alpha_{t}}\cdot x_0,(1-\alpha_{t})I)\\ &\Updownarrow\\ p(x)&=N(x|\mu,\Lambda^{-1}) (2.113)\\ q_{\sigma}(x_{t-1}|x_t,x_0)&=N\Bigg(x_{t-1};\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 ,\sigma_t^2 I\Bigg) \\ &\Updownarrow\\ p(y|x)&=N(y|Ax+b,L^{-1})(2.114)\\ q_\sigma(x_{t-1}|x_0)& \Leftrightarrow p(y)=N(y|A\mu+b,L^{-1}+A\Lambda^{-1}A^T) \end{split} \end{equation} qσ(xt∣x0)p(x)qσ(xt−1∣xt,x0)p(y∣x)qσ(xt−1∣x0)=N(xt;αt⋅x0,(1−αt)I)⇕=N(x∣μ,Λ−1)(2.113)=N(xt−1;1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0,σt2I)⇕=N(y∣Ax+b,L−1)(2.114)⇔p(y)=N(y∣Aμ+b,L−1+AΛ−1AT)
对比可以知道,(2.113)和(2.114)中的各项分别如下所示
μ = α t ⋅ x 0 Λ − 1 = 1 − α t A = 1 − α t − 1 − σ t 2 1 − α t b = [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 L − 1 = σ t 2 \begin{equation} \begin{split} \mu&=\sqrt{\alpha_{t}}\cdot x_0\\ \Lambda^{-1}&=1-\alpha_{t}\\ A&=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\\ b&=\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ L^{-1}&=\sigma_t^2 \end{split} \end{equation} μΛ−1AbL−1=αt⋅x0=1−αt=1−αt1−αt−1−σt2=[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0=σt2
分布 q σ ( x t − 1 ∣ x 0 ) q_\sigma(x_{t-1}|x_0) qσ(xt−1∣x0)的均值和方差分别如下所示:
E ( q σ ( x t − 1 ∣ x 0 ) ) = E ( p ( y ) ) = A μ + b = 1 − α t − 1 − σ t 2 1 − α t ⋅ α t ⋅ x 0 + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ α t ⋅ x 0 + α t − 1 ⋅ x 0 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ x 0 = α t − 1 ⋅ x 0 V a r ( q σ ( x t − 1 ∣ x 0 ) ) = V a r ( p ( y ) ) = L − 1 + A Λ − 1 A T = σ t 2 + 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) ⋅ 1 − α t − 1 − σ t 2 1 − α t = σ t 2 + 1 − α t − 1 − σ t 2 = 1 − α t − 1 \begin{equation} \begin{split} E\big(q_\sigma(x_{t-1}|x_0)\big)&=E\big(p(y)\big)\\ &=A\mu+b\\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot \sqrt{\alpha_{t}}\cdot x_0+\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ &=\bcancel{\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot \sqrt{\alpha_{t}}\cdot x_0}+\sqrt{\alpha_{t-1}}\cdot x_0-\bcancel{\frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \cdot x_0}\\ &=\sqrt{\alpha_{t-1}}\cdot x_0 \\ \\ \\ Var\big(q_\sigma(x_{t-1}|x_0)\big)&=Var\big(p(y)\big)\\ &=L^{-1}+A\Lambda^{-1}A^T \\ &=\sigma_t^2+\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot (1-\alpha_{t}) \cdot \sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}} \\ &=\sigma_t^2+1-\alpha_{t-1}-\sigma_t^2\\ &=1-\alpha_{t-1} \end{split} \end{equation} E(qσ(xt−1∣x0))Var(qσ(xt−1∣x0))=E(p(y))=Aμ+b=1−αt1−αt−1−σt2⋅αt⋅x0+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0=1−αt1−αt−1−σt2⋅αt⋅x0 +αt−1⋅x0−1−αtαt⋅(1−αt−1−σt2)⋅x0 =αt−1⋅x0=Var(p(y))=L−1+AΛ−1AT=σt2+1−αt1−αt−1−σt2⋅(1−αt)⋅1−αt1−αt−1−σt2=σt2+1−αt−1−σt2=1−αt−1
证毕!