[论文精读]How Powerful are Graph Neural Networks?

论文原文:[1810.00826] How Powerful are Graph Neural Networks? (arxiv.org)

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用!

1. 省流版

1.1. 心得

        ①Emm, 数学上的解释性确实很强了

        ②他一直在...在说引理

1.2. 论文框架图

2. 论文逐段精读

2.1. Abstract

        ①Even though the occurrence of Graph Neural Networks (GNNs) changes graph representation learning to a large extent, it and its variants are all limited in representation abilities.

2.2. Introduction

        ①Briefly introduce how GNN works (combining node information from k-hop neighbors and then pooling)

        ②The authors hold the view that ⭐ other graph models mostly based on plenty experimental trial-and-errors rather than theoretical understanding

        ③They combine GNNs and the Weisfeiler-Lehman (WL) graph isomorphism test to build a new framework, which relys on multisets

        ④GIN is excellent in distinguish, capturing and representaion

heuristics  n.[U] (formal) 探索法;启发式

heuristic  adj.(教学或教育)启发式的

2.3. Preliminaries

(1)Their definition

        ①They define two tasks: node classicifation with node label y_{v} and graph classification with graph label y_{i},i\in \left \{ 1,2...,N \right \}

(2)Other models

        ①The authors display the function of GNN in the k-th layer:

a_v^{(k)}=\text{AGGREGATE}^{(k)}\left(\left\{h_u^{(k-1)}:u\in\mathcal{N}(v)\right\}\right),\\\quad h_v^{(k)}=\text{COMBINE}^{(k)}\left(h_v^{(k-1)},a_v^{(k)}\right),

where only h_{v}^{(0)} is initialized to X_{v} (其余细节就不多说了,在GNN的笔记里都有)

        ②Pooling layer of GraphSAGE, the AGGREGATE function is:

a_v^{(k)}=\text{MAX}\left(\left\{\text{ReLU}\left(W\cdot h_u^{(k-1)}\right),\forall u\in\mathcal{N}(v)\right\}\right)

where MAX is element-wise max-pooling operator;

W is learnable weight matrix;

and followed by concatenated COMBINE and linear mapping W\cdot\left[h_{v}^{(k-1)},a_{v}^{(k)}\right]

        ③AGGREGATE and COMBINE areintegrated in GCN:

h_v^{(k)}=\text{ReLU}\left(W\cdot\text{MEAN}\left\{h_u^{(k-1)},\forall u\in\mathcal{N}(v)\cup\{v\}\right\}\right)

        ④Lastly follows a READOUT layer to get final prediction answer:

h_G=\text{READOUT}\big(\big\{h_v^{(K)}\big|v\in G\big\}\big)

where the READOUT function can be different forms

(3)Weisfeiler-Lehman (WL) test

        ①WL firstly aggregates nodes and their neighborhoods and then hashs the labels (??hash?这好吗)

        ②Based on WL, WL subtree kernel was proposed to evaluate the similarity between graphs

        ③A subtree of height k's root node is the node at k-th iteration

permutation  n.置换;排列(方式);组合(方式)

2.4. Theoretical framework: overview

        ①The framework overview

        ②Multiset: is a 2-tuple X=(S,m), where "where S is the underlying set of X that is formed from its distinct elements, and m:S\rightarrow \mathbb{N}_{\geq 1} gives the multiplicity of the elements" (我没有太懂这句话欸

        ③They are not allowed that GNN map different neighbors to the same representation. Thus, the aggregation must be injective (我也不造为啥

2.5. Building powerful graph neural networks

        ①They define Lemma 2, namely WL graph isomorphism test is able to correctly distinguish non-isomorphic graphs

        ②Theorem 3 完全没看懂

        ③Lemma 4: If input feature space is countable, then the space of node hidden features h_{v}^{(k)} is also countable

2.5.1. Graph isomorphism network (GIN)

        ①Lemma 5: there is f:\mathcal{X}\rightarrow\mathbb{R}^{n} , which makes h(X)=\sum_{x\in X}f(x) unique in X\subset \mathcal{X} . Also there is g\left(X\right)=\phi\left(\sum_{x\in X}f(x)\right)

        ②Corollary 6: there is unique \begin{aligned}h(c,X)=(1+\epsilon)\cdot f(c)+\sum_{x\in X}f(x)\end{aligned} and g\left(c,X\right)=\varphi\left(\left(1+\epsilon\right)\cdot f(c)+\sum_{x\in X}f(x)\right).

        ③Finally, the update function of GIN can be:

h_{v}^{(k)}=\mathrm{MLP}^{(k)}\left(\left(1+\epsilon^{(k)}\right)\cdot h_{v}^{(k-1)}+\sum_{u\in\mathcal{N}(v)}h_{u}^{(k-1)}\right)

2.5.2. Graph-level readout of GIN

        ①Sum, mean and max aggregators:

        ②The fail examples when the different v and {v}' map the same embedding:

where (a) represents all the nodes are the same, only sum can distinguish them;

blue in (b) represents the max, thus max fails to distinguish as well;

same in (c). (盲猜这里其实蓝色v自己是一个节点,但是没有考虑自己的特征,而是纯看1-hop neighborhoods)

        ③They change the READOUT layer to:

h_G=\text{CONCAT}\Big(\text{READOUT}\Big(\Big\{h_v^{(k)}|v\in G\Big\}\Big)\big|k=0,1,\ldots,K\Big)

2.6. Less powerful but still interesting GNNs

        They designed ablation studies

2.6.1. 1-layer perceptrons are not sufficient

        ①1-layer perceptrons are akin to linear mapping, which is far insufficient for distinguishing

        ②Lemma 7: notwithstanding multiset X_{1} is different from X_{2}, they might get the same results: \sum_{x\in X_1}\text{ReLU}\left(Wx\right)=\sum_{x\in X_2}\text{ReLU}\left(Wx\right)

2.6.2. Structures that confuse mean and max-pooling

        这一节的内容在2.5.2.②的图下已经解释过了

2.6.3. Mean learns distributions

        ①Collary 8: there is a function h\left ( X \right )=\frac{1}{\left | X \right |}\sum_{x\in X}f\left ( x \right ). If and only if multisets X_{1} and X_{2} are the same distribution, h\left ( X_{1} \right )=h\left ( X_{2} \right )

        ②When statistical and distributional information in graph cover more important part, mean aggregator performs better. But when structure is valued more, mean aggregator may do worse.

        ③Sum and mean aggregator may be similar when node features are multifarious and hardly repeat

2.6.4. Max-pooling learns sets with distinct elements

        ①Max aggregator focus on learning the structure of graph (原文用的"skeleton"而不是"structure"), and it has a certain ability to resist noise and outliers

        ②For max function h\left ( X \right )=max_{x\in X}f\left ( x \right ), if and only if X_{1} and X_{2} have the same underlying set, h\left ( X_{1} \right )=h\left ( X_{2} \right )

2.6.5. Remarks on other aggregators

        ①They do not cover the analysis of weighted average via attention or LSTM pooling

2.7. Other related work

        ①Traditional GNN does not provide enough math explanation

        ②Exceptionally, RKHS of graph kernels (?) is able to approximate measurable functions in probability

        ③Also, they can hardly generalize to multple architectures

2.8. Experiments

(1)Datasets

        ①Dataset: 9 graph classification benchmarks: 4 bioinformatics datasets (MUTAG, PTC, NCI1, PROTEINS) and 5 social network datasets (COLLAB, IMDB-BINARY, IMDB-MULTI, REDDITBINARY and REDDIT-MULTI5K)

        ②Social networks are lack of node features, then they set node vectors as the same in REDDIT and use one hot encoding for others

(2)Mondels and configurations

        ①They set two variants, the one is GIN-ε, which adopts gradient descent, the other one is GIN-0, which is a little bit simpler.

        ②Performances of different variants on different datasets

        ③Validation: 10-fold LIB-SVM

        ④Layers: 5, includes input layer, and each MLP takes two layers

        ⑤Normalization: batch normalization for all hiden layers

        ⑥Optimizer: Adam

        ⑦Learning rate: 0.01 at first and substract 0.5/50 epochs

        ⑧Number of hidden units, hyper parameter: 16 or 32

        ⑨Batch size: 32 or 128

        ⑩Drop out ratio: 0 or 0.5

        ⑪Epoch: the best one in 10-fold

(3)Baselines

        ①WL subtree kernel

        ②Diffusionconvolutional neural networks (DCNN), PATCHY-SAN (Niepert) and Deep Graph CNN (DGCNN)

        ③Anonymous Walk Embeddings (AWL)

2.8.1. Results

(1)Training set performance

        ①Training set accuracy figure was showed above

        ②WL always performs better than GNN due to its strong classifying ability. However, WL can not present the node features combination, which may limit in the future

(2)Test set performance

        ①Test set classification accuracies

        ②GIN-0 obviously outperforms others

2.9. Conclusion

        They give theoretical foundations of graph structure and discuss the performances of variants of GNN. Then, they designed a strong GNN, named GIN to achieve more accurate classification. Furthermore, they think researching the generalization for GNNs is also promising.

3. Reference List

Xu, K. et al. (2019) 'How Powerful are Graph Neural Networks?', ICLR 2019. doi: https://doi.org/10.48550/arXiv.1810.00826

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/129134.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Zookeeper安装及配置

Zookeeper官网:Apache ZooKeeper 一般作为服务注册中心 无论在Windows下还是Linux下,Zookeeper的安装步骤是一样的,用的包也是同一个包 Window下安装及配置Zookeeper 下载后解压 linux安装 window及Linux安装及配置zookeeper_访问windos上的zookeeper-CSDN博客

深度学习_4 数据训练之线性回归

训练数据 线性回归 基本原理 比如我们要买房,机器学习深度学习来预测房价。房价的影响因素有:卧室数量,卫生间数量,居住面积。此外,还需要加上偏差值来计算。我们要找到一个正确率高的计算方法来计算。 首先&#…

SOLIDWORKS参数化设计之部分打包 慧德敏学

参数化设计就是通过主参数来驱动整个模型的变化,类似于SOLIDWORKS的方程式中,使用全局变量来控制模型其它参数的变化,因此要做参数化就必须要确定好主参数以及变化逻辑。 我们之前介绍过SOLIDWORKS参数化设计软件-SolidKits.AutoWorks&#…

【C++ 系列文章 -- 程序员考试 201811 下午场 C++ 专题 】

1.1 C 题目六 阅读下列说明和C代码,填写程序中的空(1) ~(5),将解答写入答题纸的对应栏内。 【说明】 以下C代码实现一个简单乐器系统,音乐类(Music)可以使用…

[Unity][VR]透视开发系列4-解决只看得到Passthrough但看不到Unity对象的问题

【视频资源】 视频讲解地址请关注我的B站。 专栏后期会有一些不公开的高阶实战内容或是更细节的指导内容。 B站地址: https://www.bilibili.com/video/BV1Zg4y1w7fZ/ 我还有一些免费和收费课程在网易云课堂(大徐VR课堂): https://study.163.com/provider/480000002282025/…

MongoDB——MongoDB删除系统自带的local数据库

一、MongoDB删除系统自带的local数据库 1.1、linux环境进入mongo客户端 输入 mongo 命令,进入命令行客户端 进入admin库,并登录,查看所有数据库 #进入admin库 use admin #并登录admin db.auth("username","password")…

理解训练深度前馈神经网络的难度【PMLR 2010】

论文地址:Excellent-Paper-For-Daily-Reading/summarize at main 类别:综述 时间:2023/11/03 摘要 这篇论文比较久了,但仍能从里面获得一些收获,论文主要是讨论并研究了不同的非线性激活函数的影响,sig…

不一样的编程方式 —— 协程(设计原理与汇编实现)

主要通过以下9个方面来了解协程的原理: 目录 1、为什么使用协程 1.3、协程的适用场景 2、协程的原语操作 3、协程的切换 3.1、汇编实现 4.协程的运行流程 5.协程的结构体定义(我们其实可以参照线程或者进程的状态来设计) 5.1、多状态集合设计 6.协程的调度…

UE5.0.3版本 像素流送 Pixel Streaming

目录 0 引言1 准备工作1.1 下载Node.js1.2 下载 PixelStreaming(非必须) 2 快速入门2.1 打包工程2.2 启动信令服务器2.3 启动工程2.4 打开网页 3 总结 🙋‍♂️ 作者:海码007📜 专栏:UE虚幻引擎专栏&#x…

MySQL表的增删改查(基础)

文章目录 一、CRUD二、新增(Create)2.1 单行数据全列插入2.2多行数据指定列插入 三、查询3.1 全列查询3.2 指定列查询3.3 查询字段表达式3.4 别名3.5 去重 DISTINCT3.6 排序3.7 条件查询 WHERE3.8 分页查询 LIMIT 四、修改(Update&#xff09…

防火墙日志记录和分析

防火墙监控进出网络的流量,并保护部署防火墙的网络免受恶意流量的侵害。它是一个网络安全系统,它根据一些预定义的规则监控传入和传出的流量,它以日志的形式记录有关如何管理流量的信息,日志数据包含流量的源和目标 IP 地址、端口…

【广州华锐互动】VR特警作战模拟演练系统

在科技发展的驱动下,各行各业都在寻找新的方式来提升效率和培训质量。其中,虚拟现实(VR)技术在各个领域都有广泛的应用,包括警察培训。VR特警作战模拟演练系统由VR公司广州华锐互动开发,它使用虚拟现实环境…

pb:导入EXCEL,提示“不能连接EXCEL”

pb:导入EXCEL,提示“不能连接EXCEL” ------------------------------------------------------------------------------------------------------------------------------- 1.pb连上EXCEL代码: //从EXCEL读取文件 STRING LS_PATH,LS_FILE,ls_file_tmp oleobject ole_1…

CHS零壹视频恢复程序高级版视频修复OCR使用方法

目前CHS零壹视频恢复程序监控版、专业版、高级版已经支持了OCR,OCR是一种光学识别系统,高级版最新版本中不仅仅是在视频恢复中支持OCR,同时视频修复模块也增加了OCR功能,此功能可以针对一些批量修复的视频文件(如执法仪…

预防HPV?谭巍主任分享提高抵抗力的五种水果

在快节奏的现代生活中,我们常常会因为工作、学习或者其他原因而忽视了自己的健康。身体抵抗力是人体抵御外部环境压力和疾病入侵的重要防线。这个防线需要我们通过营养的补充和适当的锻炼来维护。在这篇文章中,劲松HPV防治诊疗中心谭巍主任将介绍五种能够…

使用IDEA生成JavaDoc文档(IDEA2023)

1、Tool-->Generate JavaDoc 2、配置生成JavaDoc文档 1、选择生成范围,可以根据需要选择单独一个文件或者包,也可以是整个项目 2、输出目录,要把JavaDoc文档生成在哪个文件中,最好新建一个文件夹结束 3、Local:…

hadoop mapreduce的api调用WordCount本机和集群代码

本机运行代码 package com.example.hadoop.api.mr;import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache…

Leetcode—100.相同的树【简单】明天写另一种解法!

2023每日刷题(十八) Leetcode—100.相同的树 递归实现代码 /*** Definition for a binary tree node.* struct TreeNode {* int val;* struct TreeNode *left;* struct TreeNode *right;* };*/ bool isSameTree(struct TreeNode* p, struc…

C++之栈容器

1.简介 stack ,栈(堆栈),是一种先进后出(First In Last Out,FILO)的数据结构,先插入的数据在栈底,后放入的数据在栈顶,所有的数据只能从栈顶取出。   在生活中先进后出的例子友很多,例如我们在桌子上摞书…

最亮那颗星的中年危机 —— 程序员的职场困境与破局

如果说最近的这十年国内市场什么工作是最受瞩目的,那么程序员绝对算得上是夜空中最闪亮的那颗星。 伴随科技的迅猛发展,计算机走进千家万户,智能终端深深融入每个人的生活,程序员这一职业群体也逐渐成为了现代社会中不可或缺的一…