对比式机器学习揭示了跨物种共享和特异性的脑功能结构|文献速递-视觉大模型医疗图像应用

Title

题目

Contrastive machine learning reveals species -shared and -specific brain functional architecture

对比式机器学习揭示了跨物种共享和特异性的脑功能结构

文献速递介绍

猕猴被广泛作为人类的动物模型，用于研究大脑和行为的关键方面（Goulas 等，2014年）。这可以归因于两种物种在一定程度上可能共享共同的大脑架构（Orban 等，2004年；Rilling 等，2008年），如功能连接组（Margulies 等，2009年），这些连接组反映了大脑区域的协调（Sporns，2011年），并且被认为与大脑功能和行为的出现有关（Shen 等，2017年）。然而，这两种物种在大约2500万年前从它们的共同祖先分化（Kumar 和 Hedges，1998年）。随着物种进化，物种之间的共性在许多方面发生了改变，如大脑皮层的不成比例扩展和轴突的重组（Semendeferi 等，2002年；Krubitzer 和 Kaas，2005年；Van Essen 和 Dierker，2007年），猕猴大脑到人类大脑的转化可能远不仅仅是大脑尺度的变化。因此，定量研究物种间的大脑架构共性与差异，包括功能连接组，可以为大脑功能和行为的保留与进化提供见解（de Schotten 等，2019年），并且对于更好地利用动物模型将知识转化为人类应用，尤其是在科学研究和临床环境中，具有重要意义（Kelly 等，2012年）。

当前的研究通常将一个物种视为“背景”，例如猕猴，并与另一个物种进行对比（Deacon，1990年）。这种对比的范式在某些情况下有效，例如使用对比变分自动编码器（CVAE）（Abid 和 Zou，2019年）捕捉自闭症患者的大脑特征，相对于作为“背景”的正常个体（Aglinskas 等，2022年）。但是，跨物种比较中，物种是独立进化并从共同祖先分化出来的，各自发展出了独特的特征。目前的范式很难确定这些差异的来源，即它们是由于猕猴分支的分化还是人类分支的分化。因此，更合理的做法可能是假设两物种之间有一个“共享”的大脑作为“背景”，并以此为基础对比两物种，这样可以并行地孤立物种特有的特征。此外，将一个物种匹配到“背景”的方法理论上依赖于假设我们已经知道需要匹配的因素（Deacon，1990年）。然而，个体之间存在由内在个体差异甚至方法论伪影引起的差异。这些无关的变化可能会掩盖物种特有或共享变异的识别（Buckner 和 Krienen，2013年）。

为了解决这些问题，我们最近开发了一种基于变分自动编码器（Kingma 和 Welling，2013年）的新算法，称为共享-特异变异自动编码器（SU-VAE），用于对猕猴和人类大脑的静息态功能MRI导出的连接组进行比较。SU-VAE的输入为来自两个物种的未配对的大脑功能连接组，能够将每个物种的“物种特有”变异与“物种共享”变异解耦，而后者最初是未知的，并且需要进行估算。通过这种方式，这些变异被呈现为三组不同的潜在特征，如图1所示。

我们在两个大型神经影像数据集上开发了该方法：人类连接组计划（HCP）数据集（Van Essen 等，2012年）和威斯康星大学麦迪逊分校的猕猴MRI（MUWM）数据集（UW-Madison，2018年）。通过验证人类数据集，解耦后的潜在特征被确认与认知评分相关，例如与语言相关的特征，而与猕猴共享的特征则与感觉运动评分关联更强。接下来，我们识别了物种共享的连接组和各自的物种特有连接组，得到了以前研究的支持。我们进一步将这些解耦后的连接组投射到大脑皮层，从而呈现出一个反映物种连接组如何从彼此分化的梯度。为了进一步解释结果，我们计算了共享连接组和特有连接组的图形指标，以辨别它们在功能连接组进化中的潜在角色。我们发现，将人类特有的连接引入物种共享连接组中，而非猕猴特有的连接，更能增强网络效率。最后，我们探索了可能的基因调控机制，这些机制可能与在Allen Institute提供的全脑基因表达数据集（Shen 等，2012年）中，产生人类特有功能连接组的进化压力有关。我们识别了一组富集于“轴突引导”的基因。我们通过在一个独立的大规模人类数据集——中国人类连接组计划（CHCP）数据集（Ge 等，2023年）上复制结果，验证了我们方法的鲁棒性和泛化能力。

Aastract

摘要

A deep comparative analysis of brain functional connectome across species in primates has the potential toyield valuable insights for both scientific and clinical applications. However, the interspecies commonalityand differences are inherently entangled with each other and with other irrelevant factors. Here we developa novel contrastive machine learning method, called shared-unique variation autoencoder (SU-VAE), to allowdisentanglement of the species-shared and species-specific functional connectome variation between macaqueand human brains on large-scale resting-state fMRI datasets. The method was validated by confirming thathuman-specific features are differentially related to cognitive scores, while features shared with macaquebetter capture sensorimotor ones. The projection of disentangled connectomes to the cortex revealed a gradientthat reflected species divergence. In contrast to macaque, the introduction of human-specific connectomes tothe shared ones enhanced network efficiency. We identified genes enriched on ‘axon guidance’ that couldbe related to the human-specific connectomes.

对灵长类动物大脑功能连接组的跨物种深度比较分析有望为科学研究和临床应用提供宝贵的见解。然而，跨物种的共性与差异本质上交织在一起，并且与其他无关因素相互混杂。为此，我们开发了一种新颖的对比式机器学习方法，称为共享-特异变异自动编码器（Shared-Unique Variation Autoencoder, SU-VAE），用于在大规模静息态功能磁共振成像（fMRI）数据集上解耦猕猴和人类大脑功能连接组的物种共享和物种特异性变异。通过验证发现，人类特异性特征与认知评分的关系不同，而与猕猴共享的特征更能捕捉感觉运动功能特征。解耦后的功能连接组投射到大脑皮层，揭示了一种反映物种分化的梯度。与猕猴相比，人类特异性功能连接组的引入增强了共享连接组的网络效率。我们还鉴定了在人类特异性功能连接组中富集的与“轴突引导”相关的基因，这可能与人类特异性功能连接组有关。

Method

方法

Finding representations for the task is fundamental in machinelearning. Characterizing a factor as ‘disentangled’ when any intervention on this factor results in a specific change in the generateddata (Bengio et al., 2013). Recently, much work focused on learningdisentangled representations with VAEs (Higgins et al., 2016; Locatelloet al., 2019b; Tschannen et al., 2018), in which each latent featurelearns one semantically meaningful factor of variation, while being invariant to other factors. Disentangled representation learning has beenproposed as an approach to learning general representations even inthe absence of, or with limited, supervision (Liu et al., 2022). Althoughthere is no widely accepted definition of disentangled representationsyet, the main intention is to separate the main factors of variationthat are present in provided data distribution (Higgins et al., 2018;Locatello et al., 2019a). However, these methods mainly focus ondisentangling the ‘factors’ of objects, unable to cope with the separationof exclusive and shared content. There are also some works focusing ondisentangling the latent features between two domains, which mainlyseek to transfer a classifier or to map an image to a different distribution (Liu et al., 2018; Lin et al., 2019; Ding et al., 2020). Thesestudies demonstrate the potential of using unpaired data for domainspecific feature extraction and inter domain feature transformation,which are also some of the abilities of the model required for theproblem we are studying. Apart from natural images, disentangledrepresentation learning has a wide range of applications in medicalimaging, such as spatial decomposition network (SDNet) (Chartsiaset al., 2019), which factorizes 2D medical images into spatial anatomical factors and non-spatial modality factors. Thermos et al. (2021)proposed a generative model that learns to combine anatomical factorsfrom different input images, re-entangling them with the desired imaging modality (e.g., MRI), to create plausible new cardiac images withspecific characteristics. These methods of feature disentanglement byfocusing on specific fields and utilizing synthesis and other means alsoprovide us with inspiration that there is great potential for learninglatent embeddings by combining and reconstructing different features.However, These methods require additional supervised learning (Linet al., 2019) or additional human guidance (Ding et al., 2020; Chartsiaset al., 2019). At the same time, they need to clearly know in advance the features that need to be learned in each type of image dataset, suchas smile and non-smile features in sketches and real images dataset (Liuet al., 2018). These methods can be applied to conditional cross-domainimage synthesis (Thermos et al., 2021) and translation. However, theyare not very suitable for our problem which needs to be completelyunsupervised and disentangle the shared and specific features of thetwo datasets without knowing any visible features of the two datasets.

在机器学习中，找到任务的表示是至关重要的。当对某个因素进行干预时，能够导致生成数据发生特定变化，这种因素被称为“解耦的”（Bengio等人，2013）。近年来，许多研究集中在使用变分自编码器（VAE）进行解耦表示学习（Higgins等人，2016；Locatello等人，2019b；Tschannen等人，2018），在这种方法中，每个潜在特征学习一个语义上有意义的变化因素，同时对其他因素保持不变。解耦表示学习被提议作为一种方法，用于在没有或有限监督的情况下学习通用表示（Liu等人，2022）。尽管目前没有广泛接受的解耦表示定义，但主要目标是分离提供的数据分布中存在的主要变化因素（Higgins等人，2018；Locatello等人，2019a）。然而，这些方法主要专注于解耦物体的“因素”，无法处理独有内容与共享内容的分离。也有一些研究集中于解耦两个领域之间的潜在特征，主要用于迁移分类器或将图像映射到不同的分布（Liu等人，2018；Lin等人，2019；Ding等人，2020）。这些研究展示了使用未配对数据进行领域特定特征提取和跨领域特征转换的潜力，这也是我们研究问题所需的模型能力之一。除了自然图像，解耦表示学习在医学影像中也有广泛应用，例如空间分解网络（SDNet）（Chartsias等人，2019），该网络将二维医学图像分解为空间解剖因子和非空间模态因子。Thermos等人（2021）提出了一种生成模型，通过结合来自不同输入图像的解剖因子，并将其重新交织到所需的成像模态（例如MRI），以创建具有特定特征的可信新心脏图像。这些通过专注于特定领域并利用合成等手段进行特征解耦的方法，也为我们提供了灵感，表明通过结合和重建不同特征来学习潜在嵌入具有巨大潜力。然而，这些方法需要额外的监督学习（Lin等人，2019）或额外的人类指导（Ding等人，2020；Chartsias等人，2019）。同时，它们需要预先明确知道每种图像数据集中需要学习的特征，例如在草图和真实图像数据集中微笑和非微笑的特征（Liu等人，2018）。这些方法可以应用于条件跨领域图像合成（Thermos等人，2021）和转换。然而，它们并不适合我们的任务，因为我们需要完全无监督地解耦两个数据集的共享特征和特有特征，而不需要了解两个数据集的任何可见特征。

Results

结果

To vividly demonstrate the effectiveness of the proposed model, weinitially conducted validation on a synthetic dataset (SupplementarySection 1.2). Synthetic data consists of two types of data, which aresuperimposed by a shared ‘background’ (sunflower) and their ownunique ‘foregrounds’ (digits ‘0’ and ‘1’, respectively). The experimentalresults show that the SU-VAE successfully disentangled the sharedfeatures and specific features of the two types of data (SupplementaryFig.S1 & Fig.S3).Subsequently, large-scale human and macaque brain functional connectome data was used to train SU-VAE. Since interhemisphericaldissimilarity is not of our interest, we have 970 human connectomesamples and 880 macaque ones, where 800 human brain and macaquebrain data were used as the training set, while the remaining datawere used as the testing set. As shown in Fig. 2, the unpaired data isfirst passed through step 1 to obtain the reconstruction of the speciesshared brain functional connectome. Then, the three encoders in step2 respectively learn the latent representation of the macaque-specific,human-specific, and species-shared brain functional connectome. Finally, the validation results on the test set and the out-of-domain dataset show the superior performance of SU-VAE

为了生动地展示所提出模型的有效性，我们首先在一个合成数据集上进行了验证（补充部分1.2）。合成数据由两种类型的数据组成，这些数据被一个共享的“背景”（向日葵）和它们各自独特的“前景”（分别为数字‘0’和‘1’）叠加。实验结果表明，SU-VAE成功地将两种数据类型的共享特征和特定特征分离开来（补充图S1和图S3）。

随后，使用大规模的人类和猕猴脑功能连接组数据训练SU-VAE。由于我们不关注大脑半球间的差异，我们使用了970个人类连接组样本和880个猕猴样本，其中800个人类脑和猕猴脑数据被用作训练集，其余的数据用于测试集。如图2所示，未配对的数据首先通过第1步，获得物种共享的大脑功能连接组重建。然后，第2步中的三个编码器分别学习猕猴特有、人类特有和物种共享的大脑功能连接组的潜在表示。最后，测试集和域外数据集的验证结果表明，SU-VAE具有优异的性能。

Figure

图

Fig. 1. Data flow and the framework of SU-VAE aiming at disentangling ‘species-shared’ and ‘species-specific’ functional connectomes. On the left is the construction of functional connectomes of the macaque brain and the human brain on a specific atlas (Brodmann areas). On the right is a schematic diagram of the SU-VAE model.

图1. SU-VAE框架的数据流，旨在解耦“物种共享”和“物种特有”功能连接组。左侧展示了基于特定大脑图谱（Brodmann区域）构建猕猴大脑和人类大脑的功能连接组。右侧是SU-VAE模型的示意图。

Fig. 2. The detailed architecture of SU-VAE. Step 1 and step 2 are marked by blue and red panels, respectively. Only class 𝑦 is shown in step 2. The ‘∼’ in step 2 means theprocesses for 𝑥 are similar to those for ?

图 2. SU-VAE的详细架构。步骤1和步骤2分别由蓝色和红色面板标记。步骤2中仅显示类别 𝑦。步骤2中的“∼”表示 𝑥 的处理过程与 𝑦 类似。

Fig. 3. (a) Box plots of the RSA results. The horizontal white line represents the mean and the vertical red line represents a 95% confidence interval. The upper and lowerbounds of boxes represent the third and first quantile. Stars(**𝑝 < 0.05,𝑝 < 0.01,*𝑝 < 0.001) on the 𝑥-axis indicates the 𝑝-value of the paired t-test on the sampling results ofsimilarity analysis for each corresponding ‘shared’ and ‘specific’ representation of the indicators. (b) Average summary of RSA results for the three major categories of behavioralindicators in (a). Abbreviations: EMPE: Episodic Memory, EF: Executive FunctionInhibition, FI: Fluid Intelligence, LRD: Language/Reading Decoding, PS: Processing Speed, SO:

Spatial Orientation, SA: Sustained Attention, VEM: Verbal Episodic Memory, WM: Working Memory, CFC: Cognition Fluid Composite, CECC: Cognition Early Childhood Composite,CTCS: Cognition Total Composite Score, CCC: Cognition Crystallized Composite, ER_CRT: Emotion Recognition ER40_CRT, ER_ANG: Emotion Recognition ER40ANG, ER_FEAR:ER40FEAR, ER_HAP: ER40HAP, ER_SAD: ER40SAD, ENDU: Endurance, LOCO: Locomotion, DEX: Dexterity, STR: Strength, ODOR: Olfaction, TAST: Taste.

图 3. (a) RSA结果的箱形图。水平白线表示均值，垂直红线表示95%的置信区间。箱体的上界和下界分别表示第三四分位数和第一四分位数。x 轴上的星号(𝑝 <* 0.05, *𝑝 <* 0.01, 𝑝 <* 0.001)表示配对t检验对每个相应“共享”和“特定”表示的相似性分析的*𝑝值。 (b) (a)中三大行为指标类别的RSA结果的平均总结。缩写词：EMPE: 记忆回顾，EF: 执行功能抑制，FI: 流体智力，LRD: 语言/阅读解码，PS: 处理速度，SO: 空间定位，SA: 持续注意力，VEM: 语言记忆，WM: 工作记忆，CFC: 流体认知复合，CECC: 儿童早期认知复合，CTCS: 总认知复合评分，CCC: 晶化认知复合，ER_CRT: 情绪识别 ER40_CRT，ER_ANG: 情绪识别 ER40ANG，ER_FEAR: 情绪识别 ER40FEAR，ER_HAP: 情绪识别 ER40HAP，ER_SAD: 情绪识别 ER40SAD，ENDU: 耐力，LOCO: 运动，DEX: 灵活性，STR: 力量，ODOR: 嗅觉，TAST: 味觉。

Fig. 4. (a) Average matrix form of inputs and significant shared and specific connections as outputs. (b) Display of specific and shared connections against the surface of the brain as the background.

图4. (a) 输入的平均矩阵形式以及作为输出的显著共享连接和特定连接。 (b) 将特定连接和共享连接显示在大脑表面上，作为背景。

Fig. 5. T-SNE results of latent features comparison among methods on BA data. (a) T-SNE result of SU-VAE. (b) T-SNE result of CVAE. (c) T-SNE result of DIG. (d) T-SNE resultof DID.

图5. 在BA数据上不同方法的潜在特征比较的T-SNE结果。 (a) SU-VAE的T-SNE结果。 (b) CVAE的T-SNE结果。 (c) DIG的T-SNE结果。 (d) DID的T-SNE结果。

Fig. 6. Comparison of latent embeddings of different models with representation similarity analysis. (a) (b) (c) (d) (e) are the results on BA data that trained by SU-VAE, CVAE,DMR, DID, and DIG, respectively. Due to the fact that SU-VAE and CVAE are probability models based on VAE, we sampled the latent embeddings of each subject in these twomodels 6 times, while DID and DIG are GAN based models, DMR is a linear model, and each subject in these three models corresponds to only one latent embeddings.

图6. 不同模型的潜在嵌入比较及表示相似性分析。 (a) (b) (c) (d) (e) 是在BA数据上通过SU-VAE、CVAE、DMR、DID和DIG训练得到的结果。由于SU-VAE和CVAE是基于VAE的概率模型，我们对这两个模型中的每个受试者的潜在嵌入进行了6次采样，而DID和DIG是基于GAN的模型，DMR是线性模型，这三个模型中的每个受试者只有一个潜在嵌入。

Fig. 7. The differences in functional connectomes between species are transferred onto the cortical surface space. Individual matching between species was conducted by leveragingthe shared features of the trained SU-VAE. Functional connectome differences (human minus macaque) between species were then calculated based on these matched pairs. Principalcomponent analysis and spectral clustering was adopted to highlight the patterns of the species difference connectomes. The left panel shows the mean species difference connectomereordered by the clustering ranks (grayscale bars) on the first two PCs. In the middle panel, BAs were placed in the space of the reordered PCs. Crossing highlights the locationof origin, where species difference is close to zero. The 2D color map as the background was used to color-code the human cortical surface, shown in the right panel. H: human,M: macaque, Vis: visual, Aud: auditory, Sen/Mo: sensory/motor, VAN: ventral attention network, DAN: dorsal attention network, LS: limbic system, FPN: frontoparietal network,DMN: default mode network.

图7. 物种之间功能连接组的差异被转移到皮层表面空间。通过利用训练好的SU-VAE的共享特征进行物种之间的个体匹配。然后基于这些匹配的对计算物种之间的功能连接组差异（人类减去猕猴）。采用主成分分析和谱聚类来突出物种差异连接组的模式。左侧面板显示了按照聚类排名（灰度条）重新排序的物种差异连接组的均值在前两个主成分上的表现。中间面板中，BAs（布罗德曼区）被放置在重新排序的主成分空间中。交叉处突出显示物种差异接近零的原始位置。右侧面板展示了用2D色彩图作为背景对人类皮层表面进行着色编码。 H：人类，M：猕猴，Vis：视觉，Aud：听觉，Sen/Mo：感觉/运动，VAN：腹侧注意网络，DAN：背侧注意网络，LS：边缘系统，FPN：前顶网络，DMN：默认模式网络。

Fig. 8. Six graph metrics on random, macaque, human and species-shared connectome graph displayed through box plot, paired t-test is used to compare the significance ofperformance between pairwise networks, represents 𝑝 < 0.001, and the 𝑝-value of one-way ANOVA tests on all metrics less than 0.001

图8. 在随机、猕猴、人类和物种共享连接组图上的六个图形指标，通过箱形图显示。使用配对t检验比较各对网络性能的显著性，**表示𝑝 < 0.001，且所有指标的单因素方差分析（ANOVA）𝑝值均小于0.001。

Fig. 9. The six genes that were simultaneously enriched on axon guidance of three databases, and their whole brain gene expression region correlation matrices were used forPearson correlation with species shared and human unique functional connectivity matrices, and compared using paired t-tests, and represents 𝑝 < 0.001.

图9. 在三个数据库中同时富集于轴突引导的六个基因及其全脑基因表达区域相关矩阵，与物种共享和人类特有的功能连接矩阵进行皮尔逊相关性分析，并使用配对t检验进行比较，表示𝑝 < 0.001。