MessagePassing是图神经网络Python库pytorch_geometric(PyG)库里非常重要的一个基类,它可以用来创建消息传递图神经网络,pytorch_geometric里很多类比如图卷积层GCNConv和图注意力层GATConv都基于此类实现,我们也可以基于它来自定义图神经网络。
设 x i ( k − 1 ) ∈ R F \mathbf{x}_i^{(k-1)} \in \mathbb{R}^F xi(k−1)∈RF是节点 i i i在第 ( k − 1 ) (k-1) (k−1)网络层的节点特征, e j , i ∈ R D \mathbf{e}_{ j,i} \in \mathbb{R}^D ej,i∈RD是可选的从节点 j j j到节点 i i i的边特征,消息传递(message passing)图神经网络可以用下式来表示:
x i ( k ) = γ ( k ) ( x i ( k − 1 ) , ⨁ j ∈ N ( i ) ϕ ( k ) ( x i ( k − 1 ) , x j ( k − 1 ) , e j , i ) ) \mathbf{x}^{(k)}_i = \gamma^{(k)} \left( \mathbf{x}_i^{(k-1)}, \bigoplus_{j\in \mathcal{N}(i)} \phi^{(k)} \left(\mathbf{x}_i^{(k-1)}, \mathbf{x}_j^{(k-1)}, \mathbf{e}_{j,i} \right) \right) xi(k)=γ(k) xi(k−1),j∈N(i)⨁ϕ(k)(xi(k−1),xj(k−1),ej,i)
上式中的 ⨁ \bigoplus ⨁是可微分的、具有置换不变性的函数如sum
、mean
、max
,而 γ \gamma γ和 ϕ \phi ϕ则是MLPs之类的可微分函数。
pytorch_geometric的MessagePassing
里,message()
函数相当于实现函数 ϕ \phi ϕ,update()
函数则是实现 γ \gamma γ,属性aggr
来定义聚合schema ⨁ \bigoplus ⨁。
-
MessagePassing(aggr="add", flow="source_to_target", node_dim=-2)
: 聚合schema可选值为"add"
,"mean"
或"max"
,消息传递的方向取值有"source_to_target"
或"target_to_source"
,node_dim
用来表示在哪个轴传播。 -
MessagePassing.propagate(edge_index, size=None, **kwargs)
: 消息传播的入口函数,输入为edge indices和其他用于消息传递的数据,在PyG库里edge_index
一般表示成形状为[2, num_edges]
的类型为Long的COO形式,即相同索引位置的一对节点构成一条边。 -
MessagePassing.message(...)
: 进行消息传递的函数,其参数可以是任何经过propagate()
传入的参数。tensor可以被添加后缀_i
或_j
,比如x_i
和x_j
(一般x_i
指聚合信息的中心节点,x_j
是对应的邻接节点)。x = ... # Node features of shape [num_nodes, num_features] edge_index = ... # Edge indices of shape [2, num_edges]# 对于flow="source_to_target",x_i,x_j的取值。是在MessagePassing的__collect__函数中赋值的 x_j = x[edge_index[0]] # Source node features [num_edges, num_features] x_i = x[edge_index[1]] # Target node features [num_edges, num_features] edge_index_j = edge_index[0] edge_index_i = edge_index[1]
-
MessagePassing.update(aggr_out, ...)
: 为每个节点更新embedding,第一个参数是聚合输出。
pytorch_geometric的早期版本封装更少,更适合用来理解MessagePassing
的主要逻辑,下面是1.5.0版本的源码。
# 1.5.0
import inspect
from collections import OrderedDictimport torch
from torch_sparse import SparseTensor
from torch_scatter import gather_csr, scatter, segment_csrmsg_aggr_special_args = set(['adj_t',
])msg_special_args = set(['edge_index_i','edge_index_j','size_i','size_j',
])aggr_special_args = set(['ptr','index','dim_size',
])update_special_args = set([])class MessagePassing(torch.nn.Module):r"""Base class for creating message passing layers of the form.. math::\mathbf{x}_i^{\prime} = \gamma_{\mathbf{\Theta}} \left( \mathbf{x}_i,\square_{j \in \mathcal{N}(i)} \, \phi_{\mathbf{\Theta}}\left(\mathbf{x}_i, \mathbf{x}_j,\mathbf{e}_{j,i}\right) \right),where :math:`\square` denotes a differentiable, permutation invariantfunction, *e.g.*, sum, mean or max, and :math:`\gamma_{\mathbf{\Theta}}`and :math:`\phi_{\mathbf{\Theta}}` denote differentiable functions such asMLPs.See `here <https://pytorch-geometric.readthedocs.io/en/latest/notes/create_gnn.html>`__ for the accompanying tutorial.Args:aggr (string, optional): The aggregation scheme to use(:obj:`"add"`, :obj:`"mean"`, :obj:`"max"` or :obj:`None`).(default: :obj:`"add"`)flow (string, optional): The flow direction of message passing(:obj:`"source_to_target"` or :obj:`"target_to_source"`).(default: :obj:`"source_to_target"`)node_dim (int, optional): The axis along which to propagate.(default: :obj:`0`)"""def __init__(self, aggr="add", flow="source_to_target", node_dim=0):super(MessagePassing, self).__init__()self.aggr = aggrassert self.aggr in ['add', 'mean', 'max', None]self.flow = flowassert self.flow in ['source_to_target', 'target_to_source']self.node_dim = node_dimassert self.node_dim >= 0self.__msg_aggr_params__ = inspect.signature(self.message_and_aggregate).parametersself.__msg_aggr_params__ = OrderedDict(self.__msg_aggr_params__)self.__msg_params__ = inspect.signature(self.message).parametersself.__msg_params__ = OrderedDict(self.__msg_params__)self.__aggr_params__ = inspect.signature(self.aggregate).parametersself.__aggr_params__ = OrderedDict(self.__aggr_params__)self.__aggr_params__.popitem(last=False)self.__update_params__ = inspect.signature(self.update).parametersself.__update_params__ = OrderedDict(self.__update_params__)self.__update_params__.popitem(last=False)msg_aggr_args = set(self.__msg_aggr_params__.keys()) - msg_aggr_special_argsmsg_args = set(self.__msg_params__.keys()) - msg_special_argsaggr_args = set(self.__aggr_params__.keys()) - aggr_special_argsupdate_args = set(self.__update_params__.keys()) - update_special_argsself.__user_args__ = set().union(msg_aggr_args, msg_args, aggr_args,update_args)self.__fuse__ = True# Support for GNNExplainer.self.__explain__ = Falseself.__edge_mask__ = Nonedef __get_mp_type__(self, edge_index):if (torch.is_tensor(edge_index) and edge_index.dtype == torch.longand edge_index.dim() == 2 and edge_index.size(0)):return 'edge_index'elif isinstance(edge_index, SparseTensor):return 'adj_t'else:return ValueError(('`MessagePassing.propagate` only supports `torch.LongTensor` ''of shape `[2, num_messages]` or `torch_sparse.SparseTensor` ''for argument :obj:`edge_index`.'))def __set_size__(self, size, idx, tensor):if not torch.is_tensor(tensor):passelif size[idx] is None:size[idx] = tensor.size(self.node_dim)elif size[idx] != tensor.size(self.node_dim):raise ValueError((f'Encountered node tensor with size 'f'{tensor.size(self.node_dim)} in dimension {self.node_dim}, 'f'but expected size {size[idx]}.'))def __collect__(self, edge_index, size, mp_type, kwargs):i, j = (0, 1) if self.flow == 'target_to_source' else (1, 0)ij = {'_i': i, '_j': j}out = {}for arg in self.__user_args__:if arg[-2:] not in ij.keys():out[arg] = kwargs.get(arg, inspect.Parameter.empty)else:idx = ij[arg[-2:]]data = kwargs.get(arg[:-2], inspect.Parameter.empty)if data is inspect.Parameter.empty:out[arg] = datacontinueif isinstance(data, tuple) or isinstance(data, list):assert len(data) == 2self.__set_size__(size, 1 - idx, data[1 - idx])data = data[idx]if not torch.is_tensor(data):out[arg] = datacontinueself.__set_size__(size, idx, data)if mp_type == 'edge_index':out[arg] = data.index_select(self.node_dim,edge_index[idx])elif mp_type == 'adj_t' and idx == 1:rowptr = edge_index.storage.rowptr()for _ in range(self.node_dim):rowptr = rowptr.unsqueeze(0)out[arg] = gather_csr(data, rowptr)elif mp_type == 'adj_t' and idx == 0:col = edge_index.storage.col()out[arg] = data.index_select(self.node_dim, col)size[0] = size[1] if size[0] is None else size[0]size[1] = size[0] if size[1] is None else size[1]if mp_type == 'edge_index':out['edge_index_j'] = edge_index[j]out['edge_index_i'] = edge_index[i]out['index'] = out['edge_index_i']elif mp_type == 'adj_t':out['adj_t'] = edge_indexout['edge_index_i'] = edge_index.storage.row()out['edge_index_j'] = edge_index.storage.col()out['index'] = edge_index.storage.row()out['ptr'] = edge_index.storage.rowptr()out['edge_attr'] = edge_index.storage.value()out['size_j'] = size[j]out['size_i'] = size[i]out['dim_size'] = out['size_i']return outdef __distribute__(self, params, kwargs):out = {}for key, param in params.items():data = kwargs.get(key, inspect.Parameter.empty)if data is inspect.Parameter.empty:if param.default is inspect.Parameter.empty:raise TypeError(f'Required parameter {key} is empty.')data = param.defaultout[key] = datareturn outdef propagate(self, edge_index, size=None, **kwargs):r"""The initial call to start propagating messages.Args:adj (Tensor or SparseTensor): A :obj:`torch.LongTensor` or a:obj:`torch_sparse.SparseTensor` that defines the underlyingmessage propagation.:obj:`edge_index` holds the indices of a general (sparse)assignment matrix of shape :obj:`[N, M]`.If :obj:`edge_index` is of type :obj:`torch.LongTensor`, itsshape must be defined as :obj:`[2, num_messages]`, wheremessages from nodes in :obj:`edge_index[0]` are sent tonodes in :obj:`edge_index[1]`(in case :obj:`flow="source_to_target"`).If :obj:`edge_index` is of type:obj:`torch_sparse.SparseTensor`, its sparse indices:obj:`(row, col)` should relate to :obj:`row = edge_index[1]`and :obj:`col = edge_index[0]`.Hence, the only difference between those formats is that weneed to input the *transposed* sparse adjacency matrix into:func:`propagate`.size (list or tuple, optional): The size :obj:`[N, M]` of theassignment matrix in case :obj:`edge_index` is a:obj:`LongTensor`.If set to :obj:`None`, the size will be automatically inferredand assumed to be quadratic.This argument is ignored in case :obj:`edge_index` is a:obj:`torch_sparse.SparseTensor`. (default: :obj:`None`)**kwargs: Any additional data which is needed to construct andaggregate messages, and to update node embeddings."""# We need to distinguish between the old `edge_index` format and the# new `torch_sparse.SparseTensor` format.mp_type = self.__get_mp_type__(edge_index)if mp_type == 'adj_t' and self.flow == 'target_to_source':raise ValueError(('Flow direction "target_to_source" is invalid for message ''propagation based on `torch_sparse.SparseTensor`. If you ''really want to make use of a reverse message passing flow, ''pass in the transposed sparse tensor to the message passing ''module, e.g., `adj.t()`.'))if mp_type == 'edge_index':if size is None:size = [None, None]elif isinstance(size, int):size = [size, size]elif torch.is_tensor(size):size = size.tolist()elif isinstance(size, tuple):size = list(size)elif mp_type == 'adj_t':size = list(edge_index.sparse_sizes())[::-1]assert isinstance(size, list)assert len(size) == 2# We collect all arguments used for message passing in `kwargs`.kwargs = self.__collect__(edge_index, size, mp_type, kwargs)# Try to run `message_and_aggregate` first and see if it succeeds:if mp_type == 'adj_t' and self.__fuse__ and not self.__explain__:msg_aggr_kwargs = self.__distribute__(self.__msg_aggr_params__,kwargs)out = self.message_and_aggregate(**msg_aggr_kwargs)if out == NotImplemented:self.__fuse__ = False# Otherwise, run both functions in separation.if mp_type == 'edge_index' or not self.__fuse__ or self.__explain__:msg_kwargs = self.__distribute__(self.__msg_params__, kwargs)out = self.message(**msg_kwargs)if self.__explain__:edge_mask = self.__edge_mask__.sigmoid()if out.size(0) != edge_mask.size(0):loop = edge_mask.new_ones(size[0])edge_mask = torch.cat([edge_mask, loop], dim=0)assert out.size(0) == edge_mask.size(0)out = out * edge_mask.view(-1, 1)aggr_kwargs = self.__distribute__(self.__aggr_params__, kwargs)out = self.aggregate(out, **aggr_kwargs)update_kwargs = self.__distribute__(self.__update_params__, kwargs)out = self.update(out, **update_kwargs)return outdef message(self, x_j):r"""Constructs messages from node :math:`j` to node :math:`i`in analogy to :math:`\phi_{\mathbf{\Theta}}` for each edge in:obj:`edge_index`.This function can take any argument as input which was initiallypassed to :meth:`propagate`.Furthermore, tensors passed to :meth:`propagate` can be mapped to therespective nodes :math:`i` and :math:`j` by appending :obj:`_i` or:obj:`_j` to the variable name, *.e.g.* :obj:`x_i` and :obj:`x_j`."""return x_jdef aggregate(self, inputs, index, ptr=None, dim_size=None):r"""Aggregates messages from neighbors as:math:`\square_{j \in \mathcal{N}(i)}`.Takes in the output of message computation as first argument and anyargument which was initially passed to :meth:`propagate`.By default, this function will delegate its call to scatter functionsthat support "add", "mean" and "max" operations as specified in:meth:`__init__` by the :obj:`aggr` argument."""if ptr is not None:for _ in range(self.node_dim):ptr = ptr.unsqueeze(0)return segment_csr(inputs, ptr, reduce=self.aggr)else:return scatter(inputs, index, dim=self.node_dim, dim_size=dim_size,reduce=self.aggr)def message_and_aggregate(self, adj_t):r"""Fuses computations of :func:`message` and :func:`aggregate` into asingle function.If applicable, this saves both time and memory since messages do notexplicitly need to be materialized.This function will only gets called in case it is implemented andpropagation takes place based on a :obj:`torch_sparse.SparseTensor`."""return NotImplementeddef update(self, inputs):r"""Updates node embeddings in analogy to:math:`\gamma_{\mathbf{\Theta}}` for each node:math:`i \in \mathcal{V}`.Takes in the output of aggregation as first argument and any argumentwhich was initially passed to :meth:`propagate`."""return inputs
在pytorch_geometric的教程中,示意了用MessagePassing来实现GCN层,其数学公式如下式,邻居节点特征矩阵经过权重矩阵 W \mathbf{W} W转换后用节点的度来进行归一化, b \mathbf{b} b是偏置向量。
x i ( k ) = ∑ j ∈ N ( i ) ∪ { i } 1 d e g ( i ) d e g ( j ) ⋅ ( W T ⋅ x j ( k − 1 ) ) + b \mathbf{x}^{(k)}_i = \sum_{j\in \mathcal{N}(i) \cup \{i\}} \frac{1}{\sqrt{deg(i)} \sqrt{deg(j)}} \cdot \left( \mathbf{W}^T \cdot \mathbf{x}_j^{(k-1)} \right) + \mathbf{b} xi(k)=j∈N(i)∪{i}∑deg(i)deg(j)1⋅(WT⋅xj(k−1))+b
上述公式被划分为如下步骤:
- 在邻接矩阵中添加自环(self-loops)(如果邻接矩阵中本来就有自环,为了防止重复数据,需要先去除已有的自环,再添加,可以参考GATConv例子)。
- 对节点特征矩阵进行线性转换。
- 计算归一化系数。
- 归一化特征节点。
- 对所有邻居特征节点进行求和,即进行
"add"
聚合。 - 对结果应用偏置向量。
上述1-3步骤可以在消息传递之前进行,步骤4-5由消息传递函数处理,示意代码如下:
import torch
from torch.nn import Linear, Parameter
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degreeclass GCNConv(MessagePassing):def __init__(self, in_channels, out_channels):super().__init__(aggr='add') # "Add" aggregation (Step 5).self.lin = Linear(in_channels, out_channels, bias=False)self.bias = Parameter(torch.empty(out_channels))self.reset_parameters()def reset_parameters(self):self.lin.reset_parameters()self.bias.data.zero_()def forward(self, x, edge_index):# x has shape [N, in_channels]# edge_index has shape [2, E]# Step 1: Add self-loops to the adjacency matrix.edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))# Step 2: Linearly transform node feature matrix.x = self.lin(x)# Step 3: Compute normalization.row, col = edge_indexdeg = degree(col, x.size(0), dtype=x.dtype)deg_inv_sqrt = deg.pow(-0.5)deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]# Step 4-5: Start propagating messages.out = self.propagate(edge_index, x=x, norm=norm)# Step 6: Apply a final bias vector.out = out + self.biasreturn outdef message(self, x_j, norm):# x_j has shape [E, out_channels]# Step 4: Normalize node features.return norm.view(-1, 1) * x_j
参考资料:
- https://pytorch-geometric.readthedocs.io/en/latest/tutorial/create_gnn.html
- https://github.com/pyg-team/pytorch_geometric/blob/1.5.0/torch_geometric/nn/conv/message_passing.py