2. Graph Conv 的作用
The multiplication of the adjacency matrix A \textbf{A} A with the feature matrix X \textbf{X} X in the GraphConv
layer is a crucial operation in Graph Convolutional Networks (GCNs). This operation performs a localized, weighted aggregation of node features from each node’s neighbors. Here’s a detailed explanation of why this is done and what it accomplishes:
GraphConv
层中的邻接矩阵 A \textbf{A} A 与特征矩阵 X \textbf{X} X 的乘法是图卷积网络(GCN)中的关键操作。此操作对来自每个节点的邻居的节点特征执行局部加权聚合。以下详细解释了为什么这样做以及它实现了什么:
GraphConv
层中的邻接矩阵与节点特征矩阵的乘法执行 GCN 中邻居聚合的关键操作。
这允许每个节点根据其邻居的特征更新其特征,从而通过图有效地传播信息并捕获图的局部结构。
此操作与权重变换和可选的标准化相结合,使网络能够学习节点及其关系的有意义的表示。
Purpose of Adjacency Matrix Multiplication
-
Neighbor Aggregation:
- In a graph, the features of a node should be influenced by the features of its neighboring nodes. The adjacency matrix A \textbf{A} A encodes the connections between nodes, where A i j \textbf{A}_{ij} Aij is non-zero if there is an edge between node i i i and node j j j.在图中,节点的特征应该受到其相邻节点的特征的影响。邻接矩阵 A \textbf{A} A 对节点之间的连接进行编码,如果节点 i i i 和节点 j j j 不为零> .
- When we multiply A \textbf{A} A with X \textbf{X} X, each node’s feature vector is updated to be a weighted sum of the feature vectors of its neighbors.当我们将 A \textbf{A} A 与 X \textbf{X} X 相乘时,每个节点的特征向量都会更新为其邻居特征向量的加权和。
-
Information Propagation:
- This operation allows information to propagate through the graph, enabling each node to gather information from its local neighborhood.此操作允许信息在图中传播,使每个节点能够从其本地邻居收集信息。
- This is essential for capturing the local structure and feature distribution within the graph.这对于捕获图中的局部结构和特征分布至关重要。
Mathematical Interpretation
我们来分解一下 GraphConv
层的操作:
-
Matrix Multiplication:
- The first operation Y = A ⋅ X \textbf{Y} = \textbf{A} \cdot \textbf{X} Y=A⋅X where Y \textbf{Y} Y is the intermediate result, A \textbf{A} A is the adjacency matrix, and X \textbf{X} X is the input feature matrix.第一个操作 Y = A ⋅ X \textbf{Y} = \textbf{A} \cdot \textbf{X} Y=A⋅X ,其中 Y \textbf{Y} Y 是中间结果, A \textbf{A} A 是邻接矩阵, X \textbf{X} X 是输入特征矩阵。
- For node i i i, the feature vector Y i \textbf{Y}_i Yi is computed as: 对于节点 i i i ,特征向量 Y i \textbf{Y}_i Yi 计算如下: Y i = ∑ j ∈ N ( i ) A i j X j \textbf{Y}_i = \sum_{j \in \mathcal{N}(i)} \textbf{A}_{ij} \textbf{X}_j Yi=j∈N(i)∑AijXj where N ( i ) \mathcal{N}(i) N(i) denotes the neighbors of node i i i including itself (if self-loops are added). 其中 N ( i ) \mathcal{N}(i) N(i) 表示节点 i i i 的邻居,包括其自身(如果添加了自循环)。
-
Self-Loop Addition:
- If
add_self
isTrue
, X \textbf{X} X is added to Y \textbf{Y} Y. This ensures that the node’s own features are also included in the aggregation: 如果add_self
是True
,则 X \textbf{X} X 将添加到 Y \textbf{Y} Y 中。这确保了节点自身的特征也包含在聚合中: Y = A ⋅ X + X \textbf{Y} = \textbf{A} \cdot \textbf{X} + \textbf{X} Y=A⋅X+X
- If
-
Weight Transformation:
- The intermediate result Y \textbf{Y} Y is then transformed by a weight matrix W \textbf{W} W: 然后将中间结果 Y \textbf{Y} Y 通过权重矩阵 W \textbf{W} W 进行转换: Z = Y ⋅ W \textbf{Z} = \textbf{Y} \cdot \textbf{W} Z=Y⋅W
- This operation applies a linear transformation to the aggregated features, which is essential for learning the appropriate feature representation.此操作对聚合特征应用线性变换,这对于学习适当的特征表示至关重要。
-
Bias Addition:
- If a bias term is included, it is added to Z \textbf{Z} Z: 如果包含偏差项,则会将其添加到 Z \textbf{Z} Z : Z = Z + b \textbf{Z} = \textbf{Z} + \textbf{b} Z=Z+b
-
Normalization:
- If
normalize_embedding
isTrue
, the features are normalized: 如果normalize_embedding
是True
,则特征被标准化: Z = Z ∥ Z ∥ 2 \textbf{Z} = \frac{\textbf{Z}}{\|\textbf{Z}\|_2} Z=∥Z∥2Z - This ensures that the feature vectors have unit length, which can be useful in certain applications.这确保了特征向量具有单位长度,这在某些应用中很有用。
- If
Example Code Walkthrough
以下是 GraphConv
类的简化演练:
class GraphConv(nn.Module):def __init__(self, input_dim, output_dim, add_self=False, normalize_embedding=False,dropout=0.0, bias=True):super(GraphConv, self).__init__()self.add_self = add_selfself.dropout = dropoutif dropout > 0.001:self.dropout_layer = nn.Dropout(p=dropout)self.normalize_embedding = normalize_embeddingself.input_dim = input_dimself.output_dim = output_dimdevice = 'cuda' if torch.cuda.is_available() else 'cpu'self.weight = nn.Parameter(torch.FloatTensor(input_dim, output_dim)).to(device)if bias:self.bias = nn.Parameter(torch.FloatTensor(output_dim).to(device))else:self.bias = Nonedef forward(self, x, adj):if self.dropout > 0.001:x = self.dropout_layer(x)# Matrix multiplication with adjacency matrixy = torch.matmul(adj, x)# Optionally add self-loopif self.add_self:y += x# Linear transformationy = torch.matmul(y, self.weight)# Add bias if presentif self.bias is not None:y = y + self.bias# Normalize if requiredif self.normalize_embedding:y = F.normalize(y, p=2, dim=2)return y
2. GCNConv
2.0 code
class GCNConv(MessagePassing):r"""The graph convolutional operator from the `"Semi-supervisedClassification with Graph Convolutional Networks"<https://arxiv.org/abs/1609.02907>`_ paper... math::\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes theadjacency matrix with inserted self-loops and:math:`\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}` its diagonal degree matrix.The adjacency matrix can include other values than :obj:`1` representingedge weights via the optional :obj:`edge_weight` tensor.Its node-wise formulation is given by:.. math::\mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in\mathcal{N}(i) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j\hat{d}_i}} \mathbf{x}_jwith :math:`\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}`, where:math:`e_{j,i}` denotes the edge weight from source node :obj:`j` to targetnode :obj:`i` (default: :obj:`1.0`)Args:in_channels (int): Size of each input sample, or :obj:`-1` to derivethe size from the first input(s) to the forward method.out_channels (int): Size of each output sample.improved (bool, optional): If set to :obj:`True`, the layer computes:math:`\mathbf{\hat{A}}` as :math:`\mathbf{A} + 2\mathbf{I}`.(default: :obj:`False`)cached (bool, optional): If set to :obj:`True`, the layer will cachethe computation of :math:`\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1/2}` on first execution, and will use thecached version for further executions.This parameter should only be set to :obj:`True` in transductivelearning scenarios. (default: :obj:`False`)add_self_loops (bool, optional): If set to :obj:`False`, will not addself-loops to the input graph. By default, self-loops will be addedin case :obj:`normalize` is set to :obj:`True`, and not addedotherwise. (default: :obj:`None`)normalize (bool, optional): Whether to add self-loops and computesymmetric normalization coefficients on-the-fly.(default: :obj:`True`)bias (bool, optional): If set to :obj:`False`, the layer will not learnan additive bias. (default: :obj:`True`)**kwargs (optional): Additional arguments of:class:`torch_geometric.nn.conv.MessagePassing`.Shapes:- **input:**node features :math:`(|\mathcal{V}|, F_{in})`,edge indices :math:`(2, |\mathcal{E}|)`or sparse matrix :math:`(|\mathcal{V}|, |\mathcal{V}|)`,edge weights :math:`(|\mathcal{E}|)` *(optional)*- **output:** node features :math:`(|\mathcal{V}|, F_{out})`"""_cached_edge_index: Optional[OptPairTensor]_cached_adj_t: Optional[SparseTensor]def __init__(self,in_channels: int,out_channels: int,improved: bool = False,cached: bool = False,add_self_loops: Optional[bool] = None,normalize: bool = True,bias: bool = True,**kwargs,):kwargs.setdefault('aggr', 'add')super().__init__(**kwargs)if add_self_loops is None:add_self_loops = normalizeif add_self_loops and not normalize:raise ValueError(f"'{self.__class__.__name__}' does not support "f"adding self-loops to the graph when no "f"on-the-fly normalization is applied")self.in_channels = in_channelsself.out_channels = out_channelsself.improved = improvedself.cached = cachedself.add_self_loops = add_self_loopsself.normalize = normalizeself._cached_edge_index = Noneself._cached_adj_t = Noneself.lin = Linear(in_channels, out_channels, bias=False,weight_initializer='glorot')if bias:self.bias = Parameter(torch.empty(out_channels))else:self.register_parameter('bias', None)self.reset_parameters()def reset_parameters(self):super().reset_parameters()self.lin.reset_parameters()zeros(self.bias)self._cached_edge_index = Noneself._cached_adj_t = Nonedef forward(self, x: Tensor, edge_index: Adj,edge_weight: OptTensor = None) -> Tensor:if isinstance(x, (tuple, list)):raise ValueError(f"'{self.__class__.__name__}' received a tuple "f"of node features as input while this layer "f"does not support bipartite message passing. "f"Please try other layers such as 'SAGEConv' or "f"'GraphConv' instead")if self.normalize:if isinstance(edge_index, Tensor):cache = self._cached_edge_indexif cache is None:edge_index, edge_weight = gcn_norm( # yapf: disableedge_index, edge_weight, x.size(self.node_dim),self.improved, self.add_self_loops, self.flow, x.dtype)if self.cached:self._cached_edge_index = (edge_index, edge_weight)else:edge_index, edge_weight = cache[0], cache[1]elif isinstance(edge_index, SparseTensor):cache = self._cached_adj_tif cache is None:edge_index = gcn_norm( # yapf: disableedge_index, edge_weight, x.size(self.node_dim),self.improved, self.add_self_loops, self.flow, x.dtype)if self.cached:self._cached_adj_t = edge_indexelse:edge_index = cachex = self.lin(x)# propagate_type: (x: Tensor, edge_weight: OptTensor)out = self.propagate(edge_index, x=x, edge_weight=edge_weight)if self.bias is not None:out = out + self.biasreturn outdef message(self, x_j: Tensor, edge_weight: OptTensor) -> Tensor:return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_jdef message_and_aggregate(self, adj_t: Adj, x: Tensor) -> Tensor:return spmm(adj_t, x, reduce=self.aggr)
2.1
The GCNConv
class implements the graph convolutional operator described in the paper “Semi-supervised Classification with Graph Convolutional Networks” by Kipf and Welling. This operator is designed to perform convolution operations on graph-structured data.
Attributes and Their Roles
-
in_channels
:- Role: Size of each input feature vector.
- Purpose: Determines the dimensionality of the input node features.
-
out_channels
:- Role: Size of each output feature vector.
- Purpose: Determines the dimensionality of the output node features after the convolution operation.
-
improved
:- Role: Indicates whether to use an improved version of the adjacency matrix.
- Purpose: If
True
, the adjacency matrix is modified to include double self-loops (A + 2I
), which can improve performance in certain scenarios.
-
cached
:- Role: Indicates whether to cache the normalized adjacency matrix.
- Purpose: Caches the normalization of the adjacency matrix for efficiency, particularly in transductive learning scenarios.
-
add_self_loops
:- Role: Indicates whether to add self-loops to the graph.
- Purpose: Ensures that each node’s own features are included in the convolution operation. Self-loops are added if normalization is enabled.
-
normalize
:- Role: Indicates whether to normalize the adjacency matrix.
- Purpose: Normalizes the adjacency matrix using symmetric normalization, which is crucial for the GCN operator to perform correctly.
-
bias
:- Role: Indicates whether to include a learnable bias in the layer.
- Purpose: Adds a bias term to the output of the linear transformation.
-
lin
:- Role: Linear transformation applied to the input node features.
- Purpose: Transforms the input node features to the desired output dimensionality.
-
_cached_edge_index
and_cached_adj_t
:- Role: Caches the normalized adjacency matrix and its corresponding edge index.
- Purpose: Avoids recomputing the normalization in subsequent forward passes, improving efficiency.
Operation Mechanism of forward
The forward
method processes the input graph data and applies the GCN convolution operation. Here’s a detailed explanation of each step:
-
Check for Tuple Input:
- If the input
x
is a tuple or list, an error is raised because this layer does not support bipartite message passing.
- If the input
-
Normalization:
- If
normalize
isTrue
, the adjacency matrix (represented byedge_index
) and the edge weights are normalized. - If
edge_index
is a tensor, it checks the cache. If not cached, it computes the normalized adjacency matrix usinggcn_norm
and caches it ifcached
isTrue
. - If
edge_index
is aSparseTensor
, a similar caching mechanism is applied.
- If
-
Linear Transformation:
- Applies the linear transformation to the input features
x
usingself.lin(x)
.
- Applies the linear transformation to the input features
-
Message Passing:
- Calls
self.propagate
to perform message passing. This function aggregates messages from neighboring nodes according to the normalized adjacency matrix and edge weights. - The
message
method computes the messages to be passed to each node. If edge weights are provided, they are used to scale the messages.
- Calls
-
Bias Addition:
- If a bias term is included (
self.bias
is notNone
), it is added to the output features.
- If a bias term is included (
-
Return Output:
- Returns the final output node features after the convolution operation.
Example Walkthrough of forward
Method
Here’s a step-by-step walkthrough with a hypothetical input:
-
Inputs:
x
: Tensor of shape(num_nodes, in_channels)
, representing node features.edge_index
: Tensor of shape(2, num_edges)
, representing the graph’s adjacency list.edge_weight
: Optional tensor of shape(num_edges,)
, representing edge weights.
-
Normalization:
- If normalization is enabled and not cached,
gcn_norm
computes the normalized adjacency matrix and edge weights. - For example,
gcn_norm
might convert the adjacency matrixA
toD^{-1/2} A D^{-1/2}
whereD
is the degree matrix.
- If normalization is enabled and not cached,
-
Linear Transformation:
- Applies a linear transformation to
x
, resulting in a tensor of shape(num_nodes, out_channels)
.
- Applies a linear transformation to
-
Message Passing:
- Calls
self.propagate
with the normalized adjacency matrix and transformed features. - The
message
method computes the weighted sum of neighboring node features for each node.
- Calls
-
Bias Addition:
- Adds the bias term (if present) to the output features.
-
Output:
- Returns the updated node features, which now incorporate information from neighboring nodes.
By following these steps, the GCNConv
class effectively performs a graph convolution operation, updating each node’s features based on its neighbors’ features in a normalized manner.