吴恩达神经网络1-2-2_图神经网络进行药物发现-第2部分

吴恩达神经网络1-2-2

预测毒性 (Predicting Toxicity)

相关资料 (Related Material)

  • Jupyter Notebook for the article

    Jupyter Notebook的文章

  • Drug Discovery with Graph Neural Networks — part 1

    图神经网络进行药物发现-第1部分

  • Introduction to Cheminformatics

    化学信息学导论

  • Deep learning on graphs: successes, challenges, and next steps (article by prof Michael Bronstein)

    图上的深度学习:成功,挑战和下一步 (迈克尔·布朗斯坦教授的文章)

  • Towards Explainable Graph Neural Networks

    走向可解释的图形神经网络

目录 (Table of Contents)

  • Introduction

    介绍
  • Approaching the Problem with Graph Neural Networks

    图神经网络解决问题
  • Hands-on Part with Deepchem

    Deepchem的动手部分
  • About Me

    关于我

介绍 (Introduction)

In this article, we will cover another crucial factor that determines whether the drug can pass safety tests — toxicity. In fact, the toxicity accounts for 30% of rejected drug candidates making it one of the most important factors to consider during the drug development stage [1]. Machine learning will prove here very beneficial as it can filter out toxic drug candidates in the early stage of the drug discovery process.

在本文中,我们将介绍另一个决定药物是否可以通过安全性测试的关键因素- 毒性 。 实际上,毒性占被拒绝药物候选者的30%,这使其成为药物开发阶段要考虑的最重要因素之一[1]。 机器学习在这里将被证明是非常有益的,因为它可以在药物发现过程的早期筛选出有毒的候选药物。

I will assume that you’ve read my previous article which explains some topics and terms that I will be using in this article :) Let’s get started!

我假设您已经阅读了上一篇文章 ,其中解释了本文中将使用的一些主题和术语:)让我们开始吧!

图神经网络解决问题 (Approaching the Problem with Graph Neural Networks)

The feature engineering part is pretty much the same as in part 1 of the series. To convert molecular structure into an input for GNNs, we can create molecular fingerprints, or feed it into graph neural network using adjacency matrix and feature vectors. This features can be automatically generated by external software such as RDKit or Deepchem so we don’t have to worry much about it.

特征工程部分与本系列的第1部分几乎相同。 要将分子结构转换为GNN的输入,我们可以创建分子指纹,或使用邻接矩阵和特征向量将其输入到图神经网络中。 此功能可由RDKit或Deepchem等外部软件自动生成,因此我们不必为此担心。

毒性 (Toxicity)

The biggest difference is in the machine learning task itself. Toxicity prediction is a classification task, in contrary to the solubility prediction which is a regression task as we might recall from the previous article. There are many different toxicity effects such as carcinogenicity, respiratory toxicity, irritation/corrosion, and others [2]. This makes it a slightly more complicated challenge to work with as we might have to cope also with the imbalanced classes.

最大的区别在于机器学习任务本身。 毒性预测是分类任务,与溶解度预测相反,溶解度预测是回归任务,正如我们可能从上一篇文章中回忆的那样。 有许多不同的毒性作用,例如致癌性,呼吸毒性,刺激/腐蚀等[2]。 这使工作变得更加复杂,因为我们可能还必须应对不平衡的班级。

Fortunately, the toxicity datasets are often considerably bigger than the solubility counterparts. For example, the Tox21 dataset has ~12k training samples when the Delaney dataset used for solubility prediction has only ~3k training samples. This makes neural networks architectures a more promising approach to use as it can capture more hidden information.

幸运的是,毒性数据集通常比溶解度对应数据大得多。 例如,当用于溶解度预测的Delaney数据集只有约3k训练样本时,Tox21数据集具有约1.2万训练样本。 这使得神经网络体系结构可以捕获更多隐藏信息,因此成为一种更有希望的方法。

Tox21数据集 (Tox21 Dataset)

Tox21 dataset was created as a project challenging researchers to develop machine learning models that achieve the highest performance on the given data. It contains 12 distinct labels and each indicates a different toxicity effect. Overall, the dataset has 12,060 training samples and 647 test samples.

Tox21数据集是作为一个项目而创建的,该项目挑战研究人员开发可在给定数据上实现最高性能的机器学习模型。 它包含12个不同的标签,每个标签都表示不同的毒性作用。 总体而言,数据集包含12,060个训练样本和647个测试样本。

The winning approach for this challenge was DeepTox [3] which is a deep learning pipeline that utilizes chemical descriptors to predict the toxicity classes. It highly suggests that deep learning is the most effective approach and that graph neural networks have potential to achieve even higher performance.

应对这一挑战的成功方法是DeepTox [3],它是一种深度学习管道,利用化学描述符来预测毒性等级。 它强烈表明深度学习是最有效的方法,并且图神经网络有潜力获得更高的性能。

Deepchem的动手部分 (Hands-on Part with Deepchem)

Colab notebook that you can run by yourself is here.

您可以自己运行的Colab笔记本在这里。

Firstly, we import the necessary libraries. Nothing new here — we will be using Deepchem to train a GNN model on Tox21 data. The GraphConvModel is an architecture that was created by Duvenaud, et al. It uses a modified version of fingerprint algorithms to make them differentiable (so we can do a gradient update). It is one of the first GNN architectures that were designed to handle molecular structures as graphs.

首先,我们导入必要的库。 这里没什么新鲜的 -我们将使用Deepchem在Tox21数据上训练GNN模型。 GraphConvModel是Duvenaud等人创建的架构。 它使用指纹算法的修改版本以使其具有差异性(因此我们可以进行梯度更新)。 它是最早设计用于将分子结构作为图形处理的GNN架构之一。

# Importing required libraries and its utilities
import numpy as npnp.random.seed(123)
import tensorflow as tftf.random.set_seed(123)
import deepchem as dc
from deepchem.molnet import load_tox21
from deepchem.models.graph_models import GraphConvModel

Deepchem contains a convenient API to load the Tox21 for us with .load_tox21 function. We choose a featurizer as GraphConv — it will create chemical descriptors (i.e. features) to match the input requirements for our model. As this is a classification task, ROC AUC score will be used as a metric.

Deepchem包含一个方便的API,可使用来为我们加载Tox21。 load_tox21函数。 我们选择一个特征化器作为GraphConv —它会创建化学描述符(即特征)以匹配模型的输入要求。 由于这是分类任务,因此ROC AUC得分将用作度量。

# Tox21 is a part of Deepchem library
# so we can convieniently download it using load_tox21 function
tox21_tasks, tox21_datasets, transformers = load_tox21(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = tox21_datasets# Define metric for the model
metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean, mode="classification")

The beauty of the Deepchem is that the models use Keras-like API. We can train the model with .fit function. We pass len(tox21_tasks) into model’s arguments, which is a number of labels (12 in this case). This will set the output size of the final layer as 12. We use a batch size of 32 to speed up the computation time and to specify that the model is used for the classification task. The model takes several minutes to train on the Google Colab notebooks.

Deepchem的优点在于模型使用类似Keras的API。 我们可以用训练模型。 拟合函数。 我们将len(tox21_tasks)传递给模型的arguments 它是许多标签(在这种情况下为12)。 这会将最终层的输出大小设置为12。我们使用32的批处理大小来加快计算时间,并指定将模型用于分类任务。 该模型需要几分钟才能在Google Colab笔记本上进行训练。

# Define and fit the model
model = GraphConvModel(len(tox21_tasks), batch_size=32, mode='classification')
print("Fitting the model")
model.fit(train_dataset, nb_epoch=10)

After the training is complete, we can evaluate the model. Nothing difficult here again— we can still use the Keras API for that part. The ROC AUC scores are obtained with the .evaluate function.

训练完成后,我们可以评估模型。 在这里没什么困难的-我们仍然可以使用Keras API进行该部分。 ROC AUC得分是通过.evaluate函数获得的。

print("Evaluating model with ROC AUC")
train_scores = model.evaluate(train_dataset, [metric], transformers)
valid_scores = model.evaluate(valid_dataset, [metric], transformers)print("Train scores")
print(train_scores)print("Validation scores")
print(valid_scores)

In my case, the train ROC AUC score was higher than the validation ROC AUC score. This might indicate that model is overfitting to some molecules.

在我的案例中,火车的ROC AUC分数高于验证的ROC AUC分数。 这可能表明模型对某些分子过度拟合。

Image for post

You can do much more with Deepchem that. It contains several different GNN models that are as easy to use as in this tutorial. I highly suggest looking at their tutorials. For the toxicity task, they have gathered several different examples that run with different models. You can find it here.

利用Deepchem,您可以做更多的事情。 它包含几种不同的GNN模型,这些模型与本教程一样易于使用。 我强烈建议您看一下他们的教程。 对于毒性任务,他们收集了使用不同模型运行的几个不同示例。 你可以在这里找到它。

Thank you for reading the article, I hope it was useful for you!

感谢您阅读本文,希望对您有所帮助!

关于我 (About Me)

I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)

我是阿姆斯特丹大学的人工智能硕士研究生。 在业余时间,您会发现我不喜欢数据或调试我的深度学习模型(我发誓它能工作!)。 我也喜欢远足:)

Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:

如果您想与我的最新文章和其他有用内容保持联系,这是我的社交媒体个人资料:

  • Medium

  • Linkedin

    领英

  • Github

    Github

  • Personal Website

    个人网站

翻译自: https://towardsdatascience.com/drug-discovery-with-graph-neural-networks-part-2-b1b8d60180c4

吴恩达神经网络1-2-2

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391562.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Android热修复之 - 阿里开源的热补丁

1.1 基本介绍     我们先去github上面了解它https://github.com/alibaba/AndFix 这里就有一个概念那就AndFix.apatch补丁用来修复方法,接下来我们看看到底是怎么实现的。1.2 生成apatch包      假如我们收到了用户上传的崩溃信息,我们改完需要修复…

seaborn分类数据可视:散点图|箱型图|小提琴图|lv图|柱状图|折线图

一、散点图stripplot( ) 与swarmplot() 1.分类散点图stripplot( ) 用法stripplot(xNone, yNone, hueNone, dataNone, orderNone, hue_orderNone,jitterTrue, dodgeFalse, orientNone, colorNone, paletteNone,size5, edgecolor"gray", linewi…

数据图表可视化_数据可视化十大最有用的图表

数据图表可视化分析师每天使用的最佳数据可视化图表列表。 (List of best data visualization charts that Analysts use on a daily basis.) Presenting information or data in a visual format is one of the most effective ways. Researchers have proved that the human …

javascript实现自动添加文本框功能

转自:http://www.cnblogs.com/damonlan/archive/2011/08/03/2126046.html 昨天,我们公司的网络小组决定为公司做一个内部的网站,主要是为员工比如发布公告啊、填写相应信息、投诉、问题等等需求。我那同事给了我以下需求: 1.点击一…

从Mysql slave system lock延迟说开去

本文主要分析 sql thread中system lock出现的原因,但是笔者并明没有系统的学习过master-slave的代码,这也是2018年的一个目标,2018年我都排满了,悲剧。所以如果有错误请指出,也作为一个笔记用于后期学习。同时也给出笔…

接facebook广告_Facebook广告分析

接facebook广告Is our company’s Facebook advertising even worth the effort?我们公司的Facebook广告是否值得努力? 题: (QUESTION:) A company would like to know if their advertising is effective. Before you start, yes…. Facebook does ha…

seaborn线性关系数据可视化:时间线图|热图|结构化图表可视化

一、线性关系数据可视化lmplot( ) 表示对所统计的数据做散点图,并拟合一个一元线性回归关系。 lmplot(x, y, data, hueNone, colNone, rowNone, paletteNone,col_wrapNone, height5, aspect1,markers"o", sharexTrue,shareyTrue, hue_orderNone, col_orde…

eda可视化_5用于探索性数据分析(EDA)的高级可视化

eda可视化Early morning, a lady comes to meet Sherlock Holmes and Watson. Even before the lady opens her mouth and starts telling the reason for her visit, Sherlock can tell a lot about a person by his sheer power of observation and deduction. Similarly, we…

Hyperledger Fabric 1.0 从零开始(十二)——fabric-sdk-java应用

Hyperledger Fabric 1.0 从零开始(十)——智能合约(参阅:Hyperledger Fabric Chaincode for Operators——实操智能合约) Hyperledger Fabric 1.0 从零开始(十一)——CouchDB(参阅&a…

css跑道_如何不超出跑道:计划种子的简单方法

css跑道There’s lots of startup advice floating around. I’m going to give you a very practical one that’s often missed — how to plan your early growth. The seed round is usually devoted to finding your product-market fit, meaning you start with no or li…

熊猫数据集_为数据科学拆箱熊猫

熊猫数据集If you are already familiar with NumPy, Pandas is just a package build on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of single data type(dtype) Pandas has the flexibility to st…

JAVA基础——时间Date类型转换

在java中有六大时间类,分别是: 1、java.util包下的Date类, 2、java.sql包下的Date类, 3、java.text包下的DateFormat类,(抽象类) 4、java.text包下的SimpleDateFormat类, 5、java.ut…

LeetCode第五天

leetcode 第五天 2018年1月6日 22.(566) Reshape the Matrix JAVA class Solution {public int[][] matrixReshape(int[][] nums, int r, int c) {int[][] newNums new int[r][c];int size nums.length*nums[0].length;if(r*c ! size)return nums;for(int i0;i<size;i){ne…

matplotlib可视化_使用Matplotlib改善可视化设计的5个魔术技巧

matplotlib可视化It is impossible to know everything, no matter how much our experience has increased over the years, there are many things that remain hidden from us. This is normal, and maybe an exciting motivation to search and learn more. And I am sure …

robot:循环遍历数据库查询结果是否满足要求

使用list类型变量{}接收查询结果&#xff0c;再for循环遍历每行数据&#xff0c;取出需要比较的数值 转载于:https://www.cnblogs.com/gcgc/p/11424114.html

rm命令

命令 ‘rm’ &#xff08;remove&#xff09;&#xff1a;删除一个目录中的一个或多个文件或目录&#xff0c;也可以将某个目录及其下属的所有文件及其子目录均删除掉 语法&#xff1a;rm&#xff08;选项&#xff09;&#xff08;参数&#xff09; 默认会提示‘是否’删除&am…

感知器 机器学习_机器学习感知器实现

感知器 机器学习In this post, we are going to have a look at a program written in Python3 using numpy. We will discuss the basics of what a perceptron is, what is the delta rule and how to use it to converge the learning of the perceptron.在本文中&#xff0…

Python之集合、解析式,生成器,函数

一 集合 1 集合定义&#xff1a; 1 如果花括号为空&#xff0c;则是字典类型2 定义一个空集合&#xff0c;使用set 加小括号使用B方式定义集合时&#xff0c;集合内部的数必须是可迭代对象&#xff0c;数值类型的不可以 其中的值必须是可迭代对象&#xff0c;其中的元素必须是可…

python:如何传递一个列表参数

转载于:https://www.cnblogs.com/gcgc/p/11426356.html

curl的安装与简单使用

2019独角兽企业重金招聘Python工程师标准>>> windows 篇&#xff1a; 安装篇&#xff1a; 我的电脑版本是windows7,64位&#xff0c;对应的curl下载地址如下&#xff1a; https://curl.haxx.se/download.html 直接找到下面的这个版本&#xff1a; curl-7.57.0.tar.g…