Pinsker’s inequality 与 Kullback-Leibler (KL) divergence / KL散度

文章目录

Pinsker’s inequality
Kullback-Leibler (KL) divergence
- KL散度在matlab中的计算
KL散度在隐蔽通信概率推导中的应用

Pinsker’s inequality

Pinsker’s Inequality是信息论中的一个不等式，通常用于量化两个概率分布之间的差异。这个不等式是由苏联数学家Mark Pinsker于1964年提出的。

考虑两个概率分布 (P) 和 (Q) 在同一样本空间上的概率密度函数，Pinsker’s Inequality可以表示为：

[ $D_{\text{KL}}(P \parallel Q) \geq \frac{1}{2} \left(\int \left(\sqrt{p(x)} - \sqrt{q(x)}\right)^2 \, dx\right)^2$ ]

其中：

( $D_{\text{KL}}(P \parallel Q)$ ) 是P和Q之间的 $K u ll ba c k - L e ib l er$ 散度，表示两个概率分布之间的差异。
( $p (x)$ ) 和 ( $q (x)$ ) 分别是P和Q在样本点 ( $x$ ) 处的概率密度函数。

Pinsker’s Inequality表明，KL散度的平方根下界是两个概率分布在L2范数（平方积分的平方根）上的差异。这个不等式在信息论和统计学中有广泛的应用，用于量化概率分布之间的距离。

Kullback-Leibler (KL) divergence

KL散度（Kullback-Leibler散度），也称为相对熵，是一种用于衡量两个概率分布之间差异的指标。给定两个概率分布 ( $P$ ) 和 ( $Q$ )，KL散度的定义如下：

[ $D_{\text{KL}}(P \parallel Q) = \int P(x) \log\left(\frac{P(x)}{Q(x)}\right) \,dx$ ]

这个积分表示在样本空间上对 (P) 的每个事件的概率进行加权，权重是 ( $P$ ) 对应事件的概率，然后乘以 ( $P$ ) 和 ( $Q$ ) 概率比的自然对数。

KL散度有一些重要的性质：

非负性：( $D_{\text{KL}}(P \parallel Q) \geq 0$ )，等号成立当且仅当 ( $P$ ) 和 ( $Q$ ) 在所有点上都相等。
不对称性：一般情况下，( $D_{\text{KL}}(P \parallel Q) \neq D_{\text{KL}}(Q \parallel P)$ )。它衡量了从 ( $Q$ ) 到 ( $P$ ) 的信息损失，和从 ( $P$ ) 到 ( $Q$ ) 的信息损失是不同的。
不满足三角不等式：( $D_{\text{KL}}(P \parallel R) \nleq D_{\text{KL}}(P \parallel Q) + D_{\text{KL}}(Q \parallel R)$ )。这意味着KL散度不满足三角不等式，因此不能被解释为标准的距离度量。

KL散度的应用广泛，包括在信息论、统计学、机器学习等领域，例如在变分推断、最大似然估计和生成模型中。

KL散度在matlab中的计算

KL（Kullback-Leibler）散度是衡量两个概率分布之间差异的一种方法。在Matlab中，你可以使用kldiv函数来计算两个概率分布的KL散度。这个函数通常包含在Statistics and Machine Learning Toolbox中，因此你需要确保你的Matlab版本中包含了这个工具箱。

以下是一个简单的示例，演示如何使用kldiv函数计算两个离散概率分布之间的KL散度：

% 定义两个离散概率分布
P = [0.3, 0.4, 0.3]; % 第一个分布
Q = [0.5, 0.2, 0.3]; % 第二个分布% 计算KL散度
kl_divergence = kldiv(P, Q);% 显示结果
disp(['KL散度：', num2str(kl_divergence)]);

请确保你的Matlab环境中已经安装了Statistics and Machine Learning Toolbox，以便使用kldiv函数。如果没有安装，你可以通过MathWorks官方网站获取该工具箱或者使用其他方法计算KL散度，例如手动实现KL散度的计算公式。

KL散度在隐蔽通信概率推导中的应用

Robust Beamfocusing for FDA-Aided Near-Field
Covert Communications With Uncertain Location
2023 IEEE ICC

Let $\left(D_{\mathrm{w}}, \theta_{\mathrm{w}}\right)$ denote the location of Willie. We assume Willie is synchronized with Alice with the full knowledge of the carrier frequencies, and the channel vector $\mathbf{h}^{H}\left(D_{\mathrm{w}}, \theta_{\mathrm{w}}\right)$ . This is the worst case for legitimate nodes to analyze the lower bound of covert communications performance. The hypothesis test at Willie is given by

$\left\{\begin{array}{l} \mathcal{H}_{0}: y_{\mathrm{w}}^{(n)}=z_{\mathrm{w}}^{(n)}, \\ \mathcal{H}_{1}: y_{\mathrm{w}}^{(n)}=\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w} s^{(n)}+z_{\mathrm{w}}^{(n)}, \end{array}\right.$

where $\mathbf{h}_{\mathrm{w}}^{H}$ is short for $\mathbf{h}^{H}\left(D_{\mathrm{w}}, \theta_{\mathrm{w}}\right)$ , and $z_{\mathrm{w}}^{(n)} \sim \mathcal{C N}\left(0, \sigma_{\mathrm{w}}^{2}\right)$ is the AWGN at Willie with noise power $\sigma_{\mathrm{w}}^{2}$ . From (5), the probability distribution functions (PDFs) of $\mathbf{y}_{\mathrm{w}}= \left[y_{\mathrm{w}}^{(1)}, y_{\mathrm{w}}^{(2)}, \ldots, y_{\mathrm{w}}^{(N)}\right]^{T}$ under $\mathcal{H}_{0}$ and $\mathcal{H}_{1}$ can be derived as

$\mathbb{P}_{0} \triangleq \mathbb{P}\left(\mathbf{y}_{\mathrm{w}} \mid \mathcal{H}_{0}\right)=\frac{1}{\pi^{N} \sigma_{\mathrm{w}}^{2 N}} e^{-\frac{\mathbf{y}_{\mathrm{w}}^{H} \mathbf{y}_{\mathrm{w}}}{\sigma_{\mathrm{w}}^{2}}} \tag{6}$

and：

$\mathbb{P}_{1} \triangleq \mathbb{P}\left(\mathbf{y}_{\mathrm{w}} \mid \mathcal{H}_{1}\right)=\frac{1}{\pi^{N}\left(\left|\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w}\right|^{2}+\sigma_{\mathrm{w}}^{2}\right)^{N}} e^{-\frac{\mathbf{y}_{\mathrm{w}}^{H} \mathbf{y}_{\mathrm{w}}}{\left|\mathbf{h}_{\mathrm{w}}^{H}\right|^{2}+\sigma_{\mathrm{w}}^{2}}} \tag{7}$

respectively. Let $\mathcal{D}_{0}$ and $\mathcal{D}_{1}$ denote the decisions in favor of $\mathcal{H}_{0}$ and $\mathcal{H}_{1}$ , respectively. The false alarm and missed detection probabilities are defined as $\mathbb{P}_{F A} \triangleq \mathbb{P}\left(\mathcal{D}_{1} \mid \mathcal{H}_{0}\right)$ and $\mathbb{P}_{M D} \triangleq \mathbb{P}\left(\mathcal{D}_{0} \mid \mathcal{H}_{1}\right)$ , respectively. The detection performance of Willie is characterized by the sum of the detection error probabilities $\xi=\mathbb{P}_{F A}+\mathbb{P}_{M D}$ . Under the optimal detection, $\xi$ is minimized, which is denoted by $\xi^{*}$ . Then the covertness constraint of the system is expressed as $\xi^{*} \triangleq \mathbb{P}_{F A}+\mathbb{P}_{M D} \geq 1-\epsilon$ , where
$\epsilon \in[0,1]$ is an arbitrarily small positive constant indicating the level of covertness. Smaller \epsilon corresponds to stricter covertness requirement. Specially, when $\epsilon=0$ , we have $\xi^{*}=1$ , which renders Willie’s detection to a blind guess. Moreover, according to Pinsker’s inequality [14], [15], we have $\xi^{*} \geq 1-\sqrt{\frac{\mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right)}{2}}$ , where $\mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right)=\int_{\mathbf{y}} \mathbb{P}_{1} \log \frac{\mathbb{P}_{1}}{\mathbb{P}_{0}} \mathrm{~d} \mathbf{y}$ is the Kullback-Leibler (KL) divergence of $\mathbb{P}_{1}$ and $\mathbb{P}_{0}$ . It can be easily verified that the original covertness constraint is satisfied as long as $\mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right) \leq 2 \epsilon^{2}$ . Furthermore, by substituting (6) and (7) into the expression of $\mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right)$ , we have $\mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right)=N \zeta\left(\frac{\left|\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w}\right|^{2}}{\sigma_{\mathrm{w}}^{2}}\right)$ , where $\zeta(x)=x-\log (1+x)$ for $\geq 0$ is a monotonically increasing function w.r.t. $x$ . Then the original covertness constraint can be simplified by

$\frac{\left|\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w}\right|^{2}}{\sigma_{\mathrm{w}}^{2}} \leq \zeta^{-1}\left(\frac{2 \epsilon^{2}}{N}\right) \tag{8}$