模式识别作业
1.说明判别分类器(如logistic回归)与上述特定类别的高斯朴素贝叶斯分类器之间的关系正是logistic回归所采用的形式。
经过第2问更加普遍的推导过程:
对应参数为:
二次项:
v=[σ112−σ1022σ112σ102,...,σD12−σD022σD12σD02]v=[\frac{\sigma_{11}^2-\sigma_{10}^2}{2\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D1}^2-\sigma_{D0}^2}{2\sigma_{D1}^2\sigma_{D0}^2}]v=[2σ112σ102σ112−σ102,...,2σD12σD02σD12−σD02]
一次项:
w=[σ102μ11−σ112μ10σ112σ102,...,σD02μD1−σD12μD0σD12σD02]w=[\frac{\sigma_{10}^2\mu_{11}-\sigma_{11}^2\mu_{10}}{\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D0}^2\mu_{D1}-\sigma_{D1}^2\mu_{D0}}{\sigma_{D1}^2\sigma_{D0}^2}]w=[σ112σ102σ102μ11−σ112μ10,...,σD12σD02σD02μD1−σD12μD0]
常数项:
b=ln(π1−π)+∑lnσi0σi1+∑σ112μ102−σ102μ1122σ112σ102b=ln(\frac{\pi}{1-\pi})+\sum ln\frac{\sigma_{i0}}{\sigma{i_1}}+\sum \frac{\sigma_{11}^2\mu_{10}^2-\sigma_{10}^2\mu_{11}^2}{2\sigma_{11}^2\sigma_{10}^2}b=ln(1−ππ)+∑lnσi1σi0+∑2σ112σ102σ112μ102−σ102μ112
其中
f(x)=11+exp(∑vixi2+wixi+b)f(x)=\frac{1}{1+exp(\sum v_ix_i^2+w_ix_i+b)}f(x)=1+exp(∑vixi2+wixi+b)1
由于σi0=σi1=σi\sigma_{i0} = \sigma_{i1}=\sigma_{i}σi0=σi1=σi
发现v=0.v=0.v=0.
二次项消失,一次项和常数项如下:
一次项:
w=[μ11−μ10σ12,...,μD1−μD0σD2]w=[\frac{\mu_{11}-\mu_{10}}{\sigma_{1}^2},...,\frac{\mu_{D1}-\mu_{D0}}{\sigma_{D}^2}]w=[σ12μ11−μ10,...,σD2μD1−μD0]
常数项:
b=ln(π1−π)+∑μ102−μ1122σ12b=ln(\frac{\pi}{1-\pi})+\sum \frac{\mu_{10}^2-\mu_{11}^2}{2\sigma_{1}^2}b=ln(1−ππ)+∑2σ12μ102−μ112
f(x)=11+exp(wixi+b)f(x)=\frac{1}{1+exp(w_ix_i+b)}f(x)=1+exp(wixi+b)1
2.当换成更普遍的高斯函数,是否仍有Logistic Regression形式?
生成式高斯朴素贝叶斯分类器如下:
P(y=1∣X)=P(X∣y=1)P(y=1)P(X)=P(X∣y=1)P(y=1)P(X∣y=1)P(y=1)+P(X∣y=0)P(y=0)P(y=1|X) = \frac{P(X|y=1)P(y=1)}{P(X)}=\frac{P(X|y=1)P(y=1)}{P(X|y=1)P(y=1)+P(X|y=0)P(y=0)}P(y=1∣X)=P(X)P(X∣y=1)P(y=1)=P(X∣y=1)P(y=1)+P(X∣y=0)P(y=0)P(X∣y=1)P(y=1)
=11+P(X∣y=1)P(y=1)P(X∣y=0)P(y=0)=11+exp(ln(P(X∣y=1)P(y=1)P(X∣y=0)P(y=0)))=\frac{1}{1+\frac{P(X|y=1)P(y=1)}{P(X|y=0)P(y=0)}}=\frac{1}{1+exp(ln(\frac{P(X|y=1)P(y=1)}{P(X|y=0)P(y=0)}))}=1+P(X∣y=0)P(y=0)P(X∣y=1)P(y=1)1=1+exp(ln(P(X∣y=0)P(y=0)P(X∣y=1)P(y=1)))1
其中
ln(P(X∣y=1)P(y=1)P(X∣y=0)P(y=0))=ln(π1−π)+lnP(X∣y=1)P(X∣y=0)ln(\frac{P(X|y=1)P(y=1)}{P(X|y=0)P(y=0)})=ln(\frac{\pi}{1-\pi})+ln\frac{P(X|y=1)}{P(X|y=0)}ln(P(X∣y=0)P(y=0)P(X∣y=1)P(y=1))=ln(1−ππ)+lnP(X∣y=0)P(X∣y=1)
=ln(π1−π)+ln((2πσi1)−1exp(−(X−μ1)2/2σ12)(2πσi0)−1exp(−(X−μ0)2/2σ02))=ln(\frac{\pi}{1-\pi}) + ln(\frac{(\sqrt{2\pi}\sigma_{i1})^{-1}exp(-(X-\mu_1)^2/2\sigma_1^2)}{(\sqrt{2\pi}\sigma_{i0})^{-1}exp(-(X-\mu_0)^2/2\sigma_0^2)})=ln(1−ππ)+ln((2πσi0)−1exp(−(X−μ0)2/2σ02)(2πσi1)−1exp(−(X−μ1)2/2σ12))
=ln(π1−π)+∑ln((2πσi1)−1exp(−(X−μi1)2/2σi12)(2πσi0)−1exp(−(X−μi0)2/2σi02))=ln(\frac{\pi}{1-\pi}) + \sum ln(\frac{(\sqrt{2\pi}\sigma_{i1})^{-1}exp(-(X-\mu_{i1})^2/2\sigma_{i1}^2)}{(\sqrt{2\pi}\sigma_{i0})^{-1}exp(-(X-\mu_{i0})^2/2\sigma_{i0}^2)})=ln(1−ππ)+∑ln((2πσi0)−1exp(−(X−μi0)2/2σi02)(2πσi1)−1exp(−(X−μi1)2/2σi12))
=ln(π1−π)+∑(lnσi0σi1+xi2σi12−σi022σi12σi02+xiσi02μi1−σi12μi0σi12σi02+σi12μi02−σi02μi122σi12σi02)=ln(\frac{\pi}{1-\pi}) +\sum{(ln\frac{\sigma_{i0}}{\sigma{i_1}}+x_i^2\frac{\sigma_{i1}^2-\sigma_{i0}^2}{2\sigma_{i1}^2\sigma_{i0}^2}+x_i\frac{\sigma_{i0}^2\mu_{i1}-\sigma_{i1}^2\mu_{i0}}{\sigma_{i1}^2\sigma_{i0}^2}+\frac{\sigma_{i1}^2\mu_{i0}^2-\sigma_{i0}^2\mu_{i1}^2}{2\sigma_{i1}^2\sigma_{i0}^2})}=ln(1−ππ)+∑(lnσi1σi0+xi22σi12σi02σi12−σi02+xiσi12σi02σi02μi1−σi12μi0+2σi12σi02σi12μi02−σi02μi12)
所以
当σi12=σi02\sigma_{i1}^2=\sigma_{i0}^2σi12=σi02时,xi2x_i^2xi2项不复存在,其对应形式刚好为logistic regression。
对应参数为:
二次项:
v=[σ112−σ1022σ112σ102,...,σD12−σD022σD12σD02]v=[\frac{\sigma_{11}^2-\sigma_{10}^2}{2\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D1}^2-\sigma_{D0}^2}{2\sigma_{D1}^2\sigma_{D0}^2}]v=[2σ112σ102σ112−σ102,...,2σD12σD02σD12−σD02]
一次项:
w=[σ102μ11−σ112μ10σ112σ102,...,σD02μD1−σD12μD0σD12σD02]w=[\frac{\sigma_{10}^2\mu_{11}-\sigma_{11}^2\mu_{10}}{\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D0}^2\mu_{D1}-\sigma_{D1}^2\mu_{D0}}{\sigma_{D1}^2\sigma_{D0}^2}]w=[σ112σ102σ102μ11−σ112μ10,...,σD12σD02σD02μD1−σD12μD0]
常数项:
b=ln(π1−π)+∑lnσi0σi1+∑σ112μ102−σ102μ1122σ112σ102b=ln(\frac{\pi}{1-\pi})+\sum ln\frac{\sigma_{i0}}{\sigma{i_1}}+\sum \frac{\sigma_{11}^2\mu_{10}^2-\sigma_{10}^2\mu_{11}^2}{2\sigma_{11}^2\sigma_{10}^2}b=ln(1−ππ)+∑lnσi1σi0+∑2σ112σ102σ112μ102−σ102μ112
其中
f(x)=11+exp(∑vixi2+wixi+b)f(x)=\frac{1}{1+exp(\sum v_ix_i^2+w_ix_i+b)}f(x)=1+exp(∑vixi2+wixi+b)1
3.非朴素高斯贝叶斯分类器是否仍具有Logistic Regress的性质?
P(y∣X)=P(x1,x2∣y=1)P(y=1)P(X)=P(x1,x2∣y=1)P(y=1)P(x1,x2∣y=1)P(y=1)+P(x1,x2∣y=0)P(y=0)P(y|X)=\frac{P(x1,x2|y=1)P(y=1)}{P(X)}=\frac{P(x1,x2|y=1)P(y=1)}{P(x1,x2|y=1)P(y=1)+P(x1,x2|y=0)P(y=0)}P(y∣X)=P(X)P(x1,x2∣y=1)P(y=1)=P(x1,x2∣y=1)P(y=1)+P(x1,x2∣y=0)P(y=0)P(x1,x2∣y=1)P(y=1)
=11+P(x1,x2∣y=1)P(y=1)P(x1,x2∣y=0)P(y=0)=11+exp(e)=\frac{1}{1+\frac{P(x1,x2|y=1)P(y=1)}{P(x1,x2|y=0)P(y=0)}} = \frac{1}{1+exp(e)}=1+P(x1,x2∣y=0)P(y=0)P(x1,x2∣y=1)P(y=1)1=1+exp(e)1
其中
e=lnπ1−π+ln(P(x1,x2∣y=1)P(x1,x2∣y=0))e =ln{\frac{\pi}{1-\pi}} + ln(\frac{P(x1,x2|y=1)}{P(x1,x2|y=0)})e=ln1−ππ+ln(P(x1,x2∣y=0)P(x1,x2∣y=1))
由于
P(x1,x2∣y=k)=12πσ1σ21−p2exp(略去)P(x1,x2|y=k)=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-p^2}}exp(略去)P(x1,x2∣y=k)=2πσ1σ21−p21exp(略去)
将其带入式子eee中,得到:
e∗2(1−p2)σ12σ22=lnπ1−π+x12(σ22−σ22)+x22(σ12−σ12)+x1(−2μ10σ22+2pσ1σ2μ20+2μ11σ22−2pσ1σ2μ21)+x2(−2μ20σ12+2pσ1σ2μ10+2μ21σ12−2pσ1σ2μ11)+x1x2(−2pσ1σ2+2pσ1σ2)−2pσ1σ2μ10μ20+2pσ1σ2μ11μ21e*2(1-p^2)\sigma_1^2\sigma_2^2 = ln{\frac{\pi}{1-\pi}} + x_1^2(\sigma_2^2-\sigma_2^2)+x_2^2(\sigma_1^2-\sigma_1^2)+x_1(-2\mu_{10}\sigma_2^2+2p\sigma_1\sigma_2\mu_{20}+2\mu_{11}\sigma_2^2-2p\sigma_1\sigma_2\mu_{21})+x_2(-2\mu_{20}\sigma_1^2+2p\sigma_1\sigma_2\mu_{10}+2\mu_{21}\sigma_1^2-2p\sigma_1\sigma_2\mu_{11})+x_1x_2(-2p\sigma_1\sigma_2+2p\sigma_1\sigma_2)-2p\sigma_1\sigma_2\mu_{10}\mu_{20}+2p\sigma_1\sigma_2\mu_{11}\mu_{21}e∗2(1−p2)σ12σ22=ln1−ππ+x12(σ22−σ22)+x22(σ12−σ12)+x1(−2μ10σ22+2pσ1σ2μ20+2μ11σ22−2pσ1σ2μ21)+x2(−2μ20σ12+2pσ1σ2μ10+2μ21σ12−2pσ1σ2μ11)+x1x2(−2pσ1σ2+2pσ1σ2)−2pσ1σ2μ10μ20+2pσ1σ2μ11μ21
=x1(−2μ10σ22+2pσ1σ2μ20+2μ11σ22−2pσ1σ2μ21)+x2(−2μ20σ12+2pσ1σ2μ10+2μ21σ12−2pσ1σ2μ11)+2pσ1σ2(μ11μ21−μ10μ20)=x_1(-2\mu_{10}\sigma_2^2+2p\sigma_1\sigma_2\mu_{20}+2\mu_{11}\sigma_2^2-2p\sigma_1\sigma_2\mu_{21})+x_2(-2\mu_{20}\sigma_1^2+2p\sigma_1\sigma_2\mu_{10}+2\mu_{21}\sigma_1^2-2p\sigma_1\sigma_2\mu_{11})+2p\sigma_1\sigma_2(\mu_{11}\mu_{21}-\mu_{10}\mu_{20})=x1(−2μ10σ22+2pσ1σ2μ20+2μ11σ22−2pσ1σ2μ21)+x2(−2μ20σ12+2pσ1σ2μ10+2μ21σ12−2pσ1σ2μ11)+2pσ1σ2(μ11μ21−μ10μ20)
故
b=2pσ1σ2(μ11μ21−μ10μ20)/(2(1−p2)σ12σ22)b=2p\sigma_1\sigma_2(\mu_{11}\mu_{21}-\mu_{10}\mu_{20})/(2(1-p^2)\sigma_1^2\sigma_2^2)b=2pσ1σ2(μ11μ21−μ10μ20)/(2(1−p2)σ12σ22)
w1=(−2μ10σ22+2pσ1σ2μ20+2μ11σ22−2pσ1σ2μ21)/(2(1−p2)σ12σ22)w_1 =(-2\mu_{10}\sigma_2^2+2p\sigma_1\sigma_2\mu_{20}+2\mu_{11}\sigma_2^2-2p\sigma_1\sigma_2\mu_{21})/(2(1-p^2)\sigma_1^2\sigma_2^2)w1=(−2μ10σ22+2pσ1σ2μ20+2μ11σ22−2pσ1σ2μ21)/(2(1−p2)σ12σ22)
w2=(−2μ20σ12+2pσ1σ2μ10+2μ21σ12−2pσ1σ2μ11)/(2(1−p2)σ12σ22)w_2 =(-2\mu_{20}\sigma_1^2+2p\sigma_1\sigma_2\mu_{10}+2\mu_{21}\sigma_1^2-2p\sigma_1\sigma_2\mu_{11})/(2(1-p^2)\sigma_1^2\sigma_2^2)w2=(−2μ20σ12+2pσ1σ2μ10+2μ21σ12−2pσ1σ2μ11)/(2(1−p2)σ12σ22)
则原式可以写成:
P(y∣X)=11+exp(b+w1x1+w2x2)P(y|X)=\frac{1}{1+exp(b+w_1x_1+w_2x_2)}P(y∣X)=1+exp(b+w1x1+w2x2)1
因此,仍然满足logistic regression形式。