文本分类的深度注意图扩散网络笔记

1 Title

Deep Attention Diffusion Graph Neural Networks for Text Classification（Yonghao Liu、Renchu Guan、Fausto Giunchiglia、Yanchun Liang、Xiaoyue Feng）【EMnlp 2021】

2 Conclusion

Text classification is a fundamental task with broad applications in natural language processing. Recently, graph neural networks (GNNs) have attracted much attention due to their powerful representation ability. However, most existing methods for text classification based on GNNs consider only one-hop neighborhoods and low-frequency information within texts, which cannot fully utilize the rich context information of documents. Moreover, these models suffer from over-smoothing issues if many graph layers are stacked. In this paper, a Deep Attention Diffusion Graph Neural Network (DADGNN) model is proposed to learn text representations, bridging the chasm of interaction difficulties between a word and its distant neighbors.

3 Good Sentences

1、Previous shallow learning-based text classification approaches mainly use hand-crafted sparse lexical features, such as bag-of-words (BoW) or n-grams, for representing texts (Li et al., 2020). Since these features are predefined, the models do not take full advantage of the large amount of training data.(The disadvantages of previous methods)
2、 Theoretically, we can capture long-range dependencies between words with a large number of layers. However, a common challenge faced by most GNNs is that performance degrades severely when stacking multiple layers to exploit larger receptive fields. Some researchers attribute this phenomenon to over-smoothing indistinguishable representation of different classes of nodes.（The challenges of GNNs meet and its probably reason）
3、One crucial reason why our model achieves more significant improvements is that the receptive field of the target node is enhanced by attention diffusion, which incorporates more informative messages (i.e., both low-frequency and high-frequency information) in the text.（The reason why this method have an advantage）

问题背景：文本分类是自然语言处理中的基础任务，图神经网络（GNNs）因其强大的表示能力而受到关注。然而，现有的基于GNN的文本分类方法通常只考虑单跳邻域和文本中的低频信息，无法充分利用文档的丰富上下文信息。
现有方法的局限性：
- 受限的感受野：大多数方法只允许图中的词访问直接邻域，无法实现长距离词交互。
- 较浅的层数：当前基于图的模型通常采用较浅的设置，因为它们在两层图中表现最佳，但无法提取超过两跳邻居的信息。
- 非精确的文档级表示：大多数模型使用简单的池化操作（如求和或平均）来获取文档级表示，这会削弱一些关键节点的影响。
- 低通滤波器：现有的基于图的方法主要是固定系数的低通滤波器，主要保留节点特征的共性，忽略了它们之间的差异。
DADGNN模型：为了克服上述限制，提出了DADGNN模型，该模型使用注意力扩散技术扩大每个词的感受野，并解耦GNNs的传播和转换过程以训练更深层的网络。此外，通过计算每个节点的权重来获得精确的文档级表示。

DADGNN有三个主要组成部分：文本图构建、关键组件和图级表示。

文本图构建：

这样构造的图的优点是图是有向的，其转移矩阵就是对称的，

Key Components：

为了获得深层网络中节点的判别特征表示，本文解耦了GNN的传播和转换过程。具体表述为:

与传统GNN不同，对于直接相连的节点对，本文使用公式3和4计算它们之间的注意力权重，并进行归一化处理：

其中 $W_{l}$ 为权重矩阵， $a_l$ 为权重向量，是第 $l$ 层共享的可训练参数。 $A_l$ 是第 $l$ 层的图注意矩阵。另外，σ是ReLU激活函数。

后续可以通过扩散机制计算复杂网络中不直接连接的节点之间的注意力。

根据注意矩阵A，得到图的注意扩散矩阵T如下：，其中ζn是可学习的系数，依赖于所构建的图网络所展示的属性。

如图所示，模型通过一个单层的注意力扩散过程来考虑节点之间的所有路径，从而捕获断开节点的信息。例如(目标节点是“graph”，为简洁起见，删除(a)的不相关边)，，。

在实际应用中，考虑到现实世界网络中小世界现象的特点，即任意两个节点之间的最短路径通常不会太长（最多四或六个跳），

为了进一步提高注意扩散层的表达能力，本文部署了一个多头注意扩散机制。具体来说，先独立计算每个头k的注意力扩散，然后将它们聚合。输出特征表示如下:

其中||是连接操作和 $W_a$ 表示转换维度的权重矩阵

Graph-Level Representation:

在传播模型的第 $l$ 层之后，就可以计算每个文本图上所有节点的最终表示。为了衡量图中每个节点的不同作用，与使用一般池化的基于图的文本分类模型相比，采用了节点级关注机制。具体可以用下式表示:其中， $W_b$ 是可训练的权重矩阵， $\Psi _i$ 表示图中节点 i 的注意力系数。为了获得每个类别的概率，进一步执行。