GMM
- 一个类一个正态分布
- N(μk,Σk)N(\mu_k,\Sigma_k)N(μk,Σk)
有监督 | 无监督 | 半监督 | |
---|---|---|---|
目标函数 | L=logp(Xl,Yl∥θ)=Σi=1llogp(yi∥θ)p(xi∥yi,θ)=Σi=1llogαyiN(xi∥θyi)L=logp(X_l,Y_l\|\theta)=\Sigma_{i=1}^llogp(y_i\|\theta)p(x_i\|y_i,\theta)\\=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i\|\theta_{y_i})L=logp(Xl,Yl∥θ)=Σi=1llogp(yi∥θ)p(xi∥yi,θ)=Σi=1llogαyiN(xi∥θyi) | p(x;θ)=ΠiNΣk=1KπkN(xi∥μk,Σk)p(x;\theta)=\Pi_i^N\Sigma_{k=1}^K\pi_kN(x_i\|\mu_k,\Sigma_k)p(x;θ)=ΠiNΣk=1KπkN(xi∥μk,Σk) | P(xl,yl,xu∥θ)=Σi=1llogαyiN(xi∥θyi)+Σi=lmlogΣk=1NαkN(xi∥θk)P(x_l,y_l,x_u\|\theta)=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i\|\theta_{y_i})+\Sigma_{i=l}^mlog\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k)P(xl,yl,xu∥θ)=Σi=1llogαyiN(xi∥θyi)+Σi=lmlogΣk=1NαkN(xi∥θk) |
E | 求导解决 | 求γik=p(yi=k∥xi)=αkN(xi∥θk)Σk=1NαkN(xi∥θk)求\gamma_{ik}=p(y_i=k\|x_i)=\frac{\alpha_kN(x_i\|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k)}求γik=p(yi=k∥xi)=Σk=1NαkN(xi∥θk)αkN(xi∥θk) | 求γik=p(yi=k∥xi)=αkN(xi∥θk)Σk=1NαkN(xi∥θk)求\gamma_{ik}=p(y_i=k\|x_i)=\frac{\alpha_kN(x_i\|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k)}求γik=p(yi=k∥xi)=Σk=1NαkN(xi∥θk)αkN(xi∥θk) |
M | μk=1lk(Σi∈Dl,yi=kxi)Σi=1lk(Σi∈Dl,yi=k(xi−μk)(xi−μk)T)αk=lkm\mu_k=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}x_i)\\\Sigma_i=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T)\\\alpha_k=\frac{l_k}{m}μk=lk1(Σi∈Dl,yi=kxi)Σi=lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T)αk=mlk | μk=Σiγ(zik)xiγ(zik)πk=Σiγ(zik)NΣk=Σiγ(zik)(xi−μk)(xi−μk)Tγ(zik)\mu_k=\frac{\Sigma_i\gamma(z_{ik})x_i}{\gamma(z_{ik})}\\\pi_k=\frac{\Sigma_i\gamma(z_{ik})}{N}\\\Sigma_k=\frac{\Sigma_i\gamma(z_{ik})(x_i-\mu_k)(x_i-\mu_k)^T}{\gamma(z_{ik})}μk=γ(zik)Σiγ(zik)xiπk=NΣiγ(zik)Σk=γ(zik)Σiγ(zik)(xi−μk)(xi−μk)T | μk=1Σi=lmγik+lk(Σi∈Dl,yi=kxi+Σi=lmγikxi)Σi=1Σi=lmγik+lk(Σi∈Dl,yi=k(xi−μk)(xi−μk)T+Σi=lmγik(xi−μk)(xi−μk)T)αk=Σi=lmγik+lkm\mu_k=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}x_i+\Sigma_{i=l}^m\gamma_{ik}x_i)\\\Sigma_i=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T+\Sigma_{i=l}^m\gamma_{ik}(x_i-\mu_k)(x_i-\mu_k)^T)\\\alpha_k=\frac{\Sigma_{i=l}^m\gamma_{ik}+l_k}{m}μk=Σi=lmγik+lk1(Σi∈Dl,yi=kxi+Σi=lmγikxi)Σi=Σi=lmγik+lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T+Σi=lmγik(xi−μk)(xi−μk)T)αk=mΣi=lmγik+lk |
半监督=无监督+有监督 |
有监督
- 目标函数: L=logp(Xl,Yl∣θ)=Σi=1llogp(yi∣θ)p(xi∣yi,θ),θi=αi,μi,ΣiL=logp(X_l,Y_l|\theta)=\Sigma_{i=1}^llogp(y_i|\theta)p(x_i|y_i,\theta),\theta_i={\alpha_i,\mu_i,\Sigma_i}L=logp(Xl,Yl∣θ)=Σi=1llogp(yi∣θ)p(xi∣yi,θ),θi=αi,μi,Σi
- =Σi=1llogαyiN(xi∣θyi)=Σi=1l(logαyi−n2log(2π)−12log(∣Σyi∣)−(xi−μyi)TΣyi−1(xi−μyi)=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i|\theta_{y_i}) \\=\Sigma_{i=1}^l(log\alpha_{y_i}-\frac{n}{2}log(2\pi)-\frac{1}{2}log(|\Sigma_{y_i}|)-(x_i-\mu_{y_i})^T\Sigma_{y_i}^{-1}(x_i-\mu_{y_i})=Σi=1llogαyiN(xi∣θyi)=Σi=1l(logαyi−2nlog(2π)−21log(∣Σyi∣)−(xi−μyi)TΣyi−1(xi−μyi)
- 直接求导得到结果
- μk=1lk(Σi∈Dl,yi=kxi)Σi=1lk(Σi∈Dl,yi=k(xi−μk)(xi−μk)T)αk=lkm\mu_k=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}x_i)\\ \Sigma_i=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T)\\ \alpha_k=\frac{l_k}{m}μk=lk1(Σi∈Dl,yi=kxi)Σi=lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T)αk=mlk
无监督
5.2GMM高斯混合模型和EM
- 概率解释: 假设有K个簇,每一个簇服从高斯分布,以概率π𝑘随机选择一个簇 k ,从其分布中采样出一个样本点,如此得到观测数据
- N个样本点𝒙的似然函数(Likelihood)
- p(x;θ)=ΠiNΣk=1KπkN(xi∣μk,Σk),其中Σkπk=1,0≤πk≤1p(x;\theta)=\Pi_i^N\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k),其中\Sigma_k\pi_k=1,0\leq \pi_k\leq 1p(x;θ)=ΠiNΣk=1KπkN(xi∣μk,Σk),其中Σkπk=1,0≤πk≤1
- 引入隐变量,指示所属类,k维独热表示
- p(zk=1)=πkp(z_k=1)=\pi_kp(zk=1)=πk
- p(xi∣z)=ΠkKN(xi∣μk,Σk)zkp(x_i|z)=\Pi_k^KN(x_i|\mu_k,\Sigma_k)^{z_k}p(xi∣z)=ΠkKN(xi∣μk,Σk)zk
- p(xi∣zk=1)=N(xi∣μk,Σk)p(x_i|z_k=1)=N(x_i|\mu_k,\Sigma_k)p(xi∣zk=1)=N(xi∣μk,Σk)
- p(xi)=Σzp(xi∣z)p(z)=Σk=1KπkN(xi∣μk,Σk)p(x_i)=\Sigma_zp(x_i|z)p(z)=\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)p(xi)=Σzp(xi∣z)p(z)=Σk=1KπkN(xi∣μk,Σk)
- 从属度(可以看做,xi属于第k个簇的解释
- γ(zik)=p(zik=1∣xi)=p(zik=1)p(xi∣zk=1)Σk=1Kp(zik=1)p(xi∣zk=1)=πkN(xi∣μk,Σk)Σk=1KπkN(xi∣μk,Σk)\gamma(z_{ik})\\=p(z_{ik=1}|x_i)\\=\frac{p(z_{ik}=1)p(x_i|z_k=1)}{\Sigma_{k=1}^Kp(z_{ik}=1)p(x_i|z_k=1)}\\=\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)}γ(zik)=p(zik=1∣xi)=Σk=1Kp(zik=1)p(xi∣zk=1)p(zik=1)p(xi∣zk=1)=Σk=1KπkN(xi∣μk,Σk)πkN(xi∣μk,Σk)
参数学习:极大似然估计–EM
- 极大似然估计
- 难:log里面有求和,所有参数耦合
- 似然函数取最大值时满足的条件:log(P(x∣θ)对μk求导log(P(x|\theta)对\mu_k求导log(P(x∣θ)对μk求导
- 0=−Σi=1NπkN(xi∣μk,Σk)Σk=1KπkN(xi∣μk,Σk)Σk(xi−μk)0=-\Sigma_{i=1}^N\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)}\Sigma_k(x_i-\mu_k)0=−Σi=1NΣk=1KπkN(xi∣μk,Σk)πkN(xi∣μk,Σk)Σk(xi−μk)
- μk=Σiγ(zik)xiγ(zik)\mu_k=\frac{\Sigma_i\gamma(z_{ik})x_i}{\gamma(z_{ik})}μk=γ(zik)Σiγ(zik)xi
- πk=Σiγ(zik)N\pi_k=\frac{\Sigma_i\gamma(z_{ik})}{N}πk=NΣiγ(zik)
- Σk=Σiγ(zik)(xi−μk)(xi−μk)Tγ(zik)\Sigma_k=\frac{\Sigma_i\gamma(z_{ik})(x_i-\mu_k)(x_i-\mu_k)^T}{\gamma(z_{ik})}Σk=γ(zik)Σiγ(zik)(xi−μk)(xi−μk)T
- 这不是封闭解–》EM
- E:给定当前参数估计值,求后验概率γ(zik)=E(zik)\gamma(z_{ik})=E(z_{ik})γ(zik)=E(zik)
- M:依据后验概率γ(zik)\gamma(z_{ik})γ(zik),求参数估计μk、πk、Σk\mu_k、\pi_k、\Sigma_kμk、πk、Σk
- 迭代收敛到局部极小
- 0=−Σi=1NπkN(xi∣μk,Σk)Σk=1KπkN(xi∣μk,Σk)Σk(xi−μk)0=-\Sigma_{i=1}^N\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)}\Sigma_k(x_i-\mu_k)0=−Σi=1NΣk=1KπkN(xi∣μk,Σk)πkN(xi∣μk,Σk)Σk(xi−μk)
EM
- 通用EM
- 目标函数:极大似然函数logP(X∣θ)=logΣzP(x,z∣θ)logP(X|\theta)=log\Sigma_zP(x,z|\theta)logP(X∣θ)=logΣzP(x,z∣θ)
- 用于:不完整数据的对数似然函数
- 不知Z的数据,只知道Z的后验分布P(z∣x,θold)P(z|x,\theta^{old})P(z∣x,θold)
- 考虑其期望Q(θ,θold)=Ep(z∣x,θold)(logP(x,z∣θ))Q(\theta,\theta^{old})=E_{p(z|x,\theta^{old})}(log P(x,z|\theta))Q(θ,θold)=Ep(z∣x,θold)(logP(x,z∣θ))
- 最大化期望θnew=argmaxθQ(θ,θold)\theta^{new}=argmax_\theta Q(\theta,\theta^{old})θnew=argmaxθQ(θ,θold)
- E:求P(z∣x,θold)P(z|x,\theta^{old})P(z∣x,θold)
- M:θnew=argmaxθQ(θ,θold)\theta^{new}=argmax_\theta Q(\theta,\theta^{old})θnew=argmaxθQ(θ,θold)
- why是启发式的,但却存在似然函数?
- Q(θ,θold)=Ep(z∣x,θold)(logP(x,z∣θ))=p(x;θ)Q(\theta,\theta^{old})=E_{p(z|x,\theta^{old})}(log P(x,z|\theta))=p(x;\theta)Q(θ,θold)=Ep(z∣x,θold)(logP(x,z∣θ))=p(x;θ)
- why是启发式的,但却存在似然函数?
- 完整数据和不完整数据的比较
- 不完整数据:logp(x)=ΣilogΣzp(xi∣z)p(z)=ΣilogΣk=1KπkN(xi∣μk,Σk)logp(x)=\Sigma_ilog \Sigma_zp(x_i|z)p(z)=\Sigma_ilog \Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)logp(x)=ΣilogΣzp(xi∣z)p(z)=ΣilogΣk=1KπkN(xi∣μk,Σk)
- 不完整数据中,参数之间是耦合的,不存在封闭解
- 完整数据
- logp(x,z∣θ)=logp(z∣θ)p(x∣z,θ)=ΣiΣkzik(logπk+logN(xi∣μk,Σk))logp(x,z|\theta)=logp(z|\theta)p(x|z,\theta)=\Sigma_i\Sigma_k z_{ik}(log\pi_k+logN(x_i|\mu_k,\Sigma_k))logp(x,z∣θ)=logp(z∣θ)p(x∣z,θ)=ΣiΣkzik(logπk+logN(xi∣μk,Σk))
- Ez(logp(x,z∣θ))=ΣiΣkE(zik)(logπk+logN(xi∣μk,Σk))=ΣiΣkγ(zik)(logπk+logN(xi∣μk,Σk))E_z(logp(x,z|\theta))\\=\Sigma_i\Sigma_kE(z_{ik})(log\pi_k+logN(x_i|\mu_k,\Sigma_k))\\=\Sigma_i\Sigma_k\gamma(z_{ik})(log\pi_k+logN(x_i|\mu_k,\Sigma_k))Ez(logp(x,z∣θ))=ΣiΣkE(zik)(logπk+logN(xi∣μk,Σk))=ΣiΣkγ(zik)(logπk+logN(xi∣μk,Σk))
EM收敛性保证
- 目标:最大化P(x∣θ)=Σzp(x,z∣θ)P(x|\theta)=\Sigma_zp(x,z|\theta)P(x∣θ)=Σzp(x,z∣θ)
- 直接优化P(x∣θ)P(x|\theta)P(x∣θ)很困难,但优化完整数据的p(x,z∣θ)p(x,z|\theta)p(x,z∣θ)容易
- 证明
- 分解
- 对任意分布q(z),下列分解成立
- lnp(x∣θ)=L(q,θ)+KL(q∣∣p)其中,L(q,θ)=Σzq(z)ln(p(x,z∣θ)q(z))KL(q∣∣p)=−Σzq(z)ln(p(z∣x,θ)q(z))KL(q∣∣p)≥0,L(q,θ)是lnp(x∣θ)的下界lnp(x|\theta)=L(q,\theta)+KL(q||p)\\其中,\\L(q,\theta)=\Sigma_zq(z)ln(\frac{p(x,z|\theta)}{q(z)})\\KL(q||p)=-\Sigma_zq(z)ln(\frac{p(z|x,\theta)}{q(z)})\\KL(q||p)\geq0,L(q,\theta)是lnp(x|\theta)的下界lnp(x∣θ)=L(q,θ)+KL(q∣∣p)其中,L(q,θ)=Σzq(z)ln(q(z)p(x,z∣θ))KL(q∣∣p)=−Σzq(z)ln(q(z)p(z∣x,θ))KL(q∣∣p)≥0,L(q,θ)是lnp(x∣θ)的下界
- E:最大化L(q,θ),q(z)=P(z∣x,θold)最大化L(q,\theta),\\q(z)=P(z|x,\theta^{old})最大化L(q,θ),q(z)=P(z∣x,θold)
- M:原来的下界L(q,θ)=ΣzP(z∣x,θold)ln(p(x,z∣θ)q(z))=Q(θ,θold)+const−−−正好是期望M:原来的下界L(q,\theta)=\Sigma_zP(z|x,\theta^{old})ln(\frac{p(x,z|\theta)}{q(z)})=Q(\theta,\theta^{old})+const---正好是期望M:原来的下界L(q,θ)=ΣzP(z∣x,θold)ln(q(z)p(x,z∣θ))=Q(θ,θold)+const−−−正好是期望
- 下界提升了
半监督
- 目标函数: L=logp(Xl,Yl,Xu∣θ)=Σi=1llogp(yi∣θ)p(xi∣yi,θ)+Σi=l+1mlog(Σk=1Np(yi=k∣θ)p(xi∣yi=k,θ)),θi=αi,μi,ΣiL=logp(X_l,Y_l,X_u|\theta)=\Sigma_{i=1}^llogp(y_i|\theta)p(x_i|y_i,\theta)+\Sigma_{i=l+1}^mlog(\Sigma_{k=1}^Np(y_i=k|\theta)p(x_i|y_i=k,\theta)),\theta_i={\alpha_i,\mu_i,\Sigma_i}L=logp(Xl,Yl,Xu∣θ)=Σi=1llogp(yi∣θ)p(xi∣yi,θ)+Σi=l+1mlog(Σk=1Np(yi=k∣θ)p(xi∣yi=k,θ)),θi=αi,μi,Σi
- =Σi=1llogαyiN(xi∣θyi)+Σi=lmlogΣk=1NαkN(xi∣θk)=Σi=1l(logαyi−n2log(2π)−12log(∣Σyi∣)−(xi−μyi)TΣyi−1(xi−μyi)+Σi=lmlog(Σk=1N(αk1(2π)n/2∣Σk∣1/2exp{−12(xi−μk)TΣk−1(xi−μk)}))=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i|\theta_{y_i})+\Sigma_{i=l}^mlog\Sigma_{k=1}^N\alpha_kN(x_i|\theta_k) \\=\Sigma_{i=1}^l(log\alpha_{y_i}-\frac{n}{2}log(2\pi)-\frac{1}{2}log(|\Sigma_{y_i}|)-(x_i-\mu_{y_i})^T\Sigma_{y_i}^{-1}(x_i-\mu_{y_i})+\Sigma_{i=l}^mlog(\Sigma_{k=1}^N(\alpha_k{{1} \over {(2\pi)^{n/2}|\Sigma_k|^{1/2}}} exp\{ -{{1} \over {2}}(x_i-\mu_k)^T{\Sigma_k}^{-1}(x_i-\mu_k)\}))=Σi=1llogαyiN(xi∣θyi)+Σi=lmlogΣk=1NαkN(xi∣θk)=Σi=1l(logαyi−2nlog(2π)−21log(∣Σyi∣)−(xi−μyi)TΣyi−1(xi−μyi)+Σi=lmlog(Σk=1N(αk(2π)n/2∣Σk∣1/21exp{−21(xi−μk)TΣk−1(xi−μk)}))
- E:求γik=p(yi=k∣xi)=αkN(xi∣θk)Σk=1NαkN(xi∣θk)求\gamma_{ik}=p(y_i=k|x_i)=\frac{\alpha_kN(x_i|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i|\theta_k)}求γik=p(yi=k∣xi)=Σk=1NαkN(xi∣θk)αkN(xi∣θk)
- M:μk=1Σi=lmγik+lk(Σi∈Dl,yi=kxi+Σi=lmγikxi)Σi=1Σi=lmγik+lk(Σi∈Dl,yi=k(xi−μk)(xi−μk)T+Σi=lmγik(xi−μk)(xi−μk)T)αk=Σi=lmγik+lkm\mu_k=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}x_i+\Sigma_{i=l}^m\gamma_{ik}x_i)\\ \Sigma_i=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T+\Sigma_{i=l}^m\gamma_{ik}(x_i-\mu_k)(x_i-\mu_k)^T)\\ \alpha_k=\frac{\Sigma_{i=l}^m\gamma_{ik}+l_k}{m}μk=Σi=lmγik+lk1(Σi∈Dl,yi=kxi+Σi=lmγikxi)Σi=Σi=lmγik+lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T+Σi=lmγik(xi−μk)(xi−μk)T)αk=mΣi=lmγik+lk