5分钟内完成胸部CT扫描机器学习

This post provides an overview of chest CT scan machine learning organized by clinical goal, data representation, task, and model.

这篇文章按临床目标,数据表示,任务和模型组织了胸部CT扫描机器学习的概述。

A chest CT scan is a grayscale 3-dimensional medical image that depicts the chest, including the heart and lungs. CT scans are used for the diagnosis and monitoring of many different conditions including cancer, fractures, and infections.

胸部CT扫描是描绘胸部(包括心脏和肺)的3维灰度医学图像。 CT扫描用于诊断和监视许多不同的状况,包括癌症,骨折和感染。

临床目标 (Clinical Goal)

The clinical goal refers to the medical abnormality that is the focus of the study. The following figure illustrates some example abnormalities, shown as 2D axial slices through the CT volume:

临床目标是指作为研究重点的医学异常。 下图说明了一些示例异常,显示为通过CT体积的2D轴向切片:

Image for post
Radiology Assistant, pneumonia 放射学助理 ,肺炎Kalpana Bansal, nodules 卡尔帕纳邦萨尔 ,结节pulmonarychronicles, honeycombing pulmonarychronicles ,蜂窝状Radiopaedia.org, emphysema Radiopaedia.org ,肺气肿TES.com, atelectasis TES.com ,肺不张ResearchGate研究之门

Many CT machine learning papers focus on lung nodules.

许多CT机器学习论文着重于肺结节 。

Other recent work has looked at pneumonia (lung infection), emphysema (a kind of lung damage that can be caused by smoking), lung cancer, or pneumothorax (air outside of the lungs rather than inside the lungs).

最近的其他工作研究了肺炎 (肺部感染), 肺气肿 (一种可能由吸烟引起的肺损伤), 肺癌或气胸 (肺部空气而不是肺部空气)。

I have been focused on multiple abnormality prediction, in which the model predicts 83 different abnormal findings simultaneously.

我一直致力于多个异常预测,其中该模型同时预测83个不同的异常发现 。

数据 (Data)

There are several different ways to represent CT data in a machine learning model, illustrated in this figure:

有几种不同的方法来表示机器学习模型中的CT数据,如图所示:

Image for post
Image by Author
图片作者

3D representations include a whole CT volume which is roughly 1000 x 512 x 512 pixels, and a 3D patch which can be large (e.g. half or a quarter of a whole volume) or small (e.g. 32 x 32 x 32 pixels).

3D表示包括大约1000 x 512 x 512像素的整个CT体积,以及可以大(例如,整个体积的一半或四分之一)或小(例如,32 x 32 x 32像素)的3D补丁。

2.5D representations make use of different perpendicular planes.

2.5D表示法使用不同的垂直平面。

  • The axial plane is horizontal like a belt, the coronal plane is vertical like a headband or old-style headphones, and the sagittal plane is vertical like the plane of a bow and arrow in front of an archer.

    轴向平面像皮带一样水平,冠状平面像头带或老式耳机一样垂直,而矢状面像弓箭手前面的弓箭平面一样垂直。
  • If we take one axial slice, one sagittal slice, and one coronal slice, and stack them up into a 3-channel image, then we have a 2.5D slice representation.

    如果我们获取一个轴向切片,一个矢状切片和一个冠状切片,并将它们堆叠成3通道图像,则我们将获得2.5D切片表示。
  • If this is done with small patches, e.g. 32 x 32 pixels, then we have a 2.5D patch representation.

    如果使用小补丁(例如32 x 32像素)完成此操作,那么我们将获得2.5D补丁表示。

Finally, 2D representations are also used. This could be a full slice (e.g. 512 x 512), or a 2D patch (e.g. 16 x 16, 32 x 32, 48 x 48). These 2D slices or patches are usually from the axial view.

最后,还使用2D表示。 这可以是完整切片(例如512 x 512)或2D补丁(例如16 x 16、32 x 32、48 x 48)。 这些2D切片或面片通常是从轴向观察的。

任务 (Task)

There are many different tasks in chest CT machine learning.

胸部CT机器学习中有许多不同的任务。

The following figure illustrates a few tasks:

下图说明了一些任务:

Image for post
Image by Author. Sub-images from Yan et al. 2018 DeepLesion and Jiang et al. 2019
图片由作者提供。 Yan等人的子图片。 2018 DeepLesion和Jiang等。 2019年

Binary classification involves assigning a 1 or 0 to the CT representation, for the presence (1) or absence (0) of an abnormality.

二进制分类涉及为异常的存在(1)或不存在(0)给CT表示分配1或0。

Multi-class classification is for mutually exclusive categories, like different clinical subtypes of interstitial lung disease. In this case the model assigns 0 to all categories except for 1 category.

多类别分类适用于互斥类别,例如间质性肺疾病的不同临床亚型。 在这种情况下,模型会将0分配给除1个类别以外的所有类别。

Multi-label classification is for non-mutually-exclusive categories, like atelectasis (collapsed lung tissue), cardiomegaly (enlarged heart), and mass. A CT scan might have some, all, or none of these findings, and the model determines which ones if any are present.

多标签分类适用于非互斥类别,例如肺不张(肺组织塌陷),心脏肥大(心脏扩大)和肿块。 CT扫描可能有部分,全部或没有这些发现,并且模型确定存在哪些发现。

Object detection involves predicting the coordinates of bounding boxes around abnormalities of interest.

对象检测涉及预测感兴趣异常周围的边界框的坐标。

Segmentation involves labeling every pixel, which is conceptually like “tracing the outlines of abnormalities and coloring them in.”

分割涉及标记每个像素,从概念上讲就像“追踪异常轮廓并将其着色”。

Different labels are needed to train these models. “Presence or absence” labels for abnormalities are needed to train classification models, e.g. [atelectasis=0, cardiomegaly = 1, mass = 0]. Bounding box labels are needed to train an object detection model. Segmentation masks (traced and filled in outlines) are needed to train a segmentation model. Only “presence or absence” labels are scalable to tens of thousands of CT scans, if these labels are extracted automatically from free-text radiology reports (e.g. the RAD-ChestCT data set of 36,316 CTs). Segmentation masks are the most time-consuming to obtain because they must be drawn manually on each slice; thus, segmentation studies typically use on the order of 100–1,000 CT scans.

需要不同的标签来训练这些模型。 训练分类模型需要使用“存在或不存在”的异常标签,例如[肺不张= 0,心脏肿大= 1,质量= 0]。 需要边界框标签来训练对象检测模型。 需要分割蒙版(跟踪并填充轮廓)来训练分割模型。 如果这些标签是从自由文本放射学报告中自动提取的(例如,包含36,316个CT的RAD-ChestCT数据集 ),则只有“存在或不存在”的标签才能扩展到成千上万的CT扫描。 分割蒙版是最耗时的,因为必须在每个切片上手动绘制它们。 因此,分割研究通常使用100-1,000次CT扫描。

模型 (Model)

Convolutional neural networks are the most popular machine learning model used on CT data. For a 5-minute intro to CNNs, see this article.

卷积神经网络是用于CT数据的最流行的机器学习模型。 有关CNN的5分钟介绍,请参阅本文 。

  • 3D CNNs are used for whole CT volumes or 3D patches

    3D CNN用于整个CT体积或3D补丁
  • 2D CNNs are used for 2.5D representations (3 channels, axial/coronal/sagittal), in the same way that 2D CNNs can take a 3-channel RGB image as input (3 channels, red/green/blue).

    2D CNN用于2.5D表示(3通道,轴向/冠状/矢状),就像2D CNN可以将3通道RGB图像作为输入(3通道,红色/绿色/蓝色)一样。
  • 2D CNNs are used for 2D slices or 2D patches.

    2D CNN用于2D切片或2D面片。

Some CNNs combine 2D and 3D convolutions. CNNs can also be “pretrained” which typically refers to first training the CNN on a natural image dataset like ImageNet and then refining the CNN’s weights on the CT data.

一些CNN结合了2D和3D卷积。 CNN也可以是“预训练”的,通常是指首先在自然图像数据集(如ImageNet)上训练CNN,然后在CT数据上细化CNN的权重。

Here is an example architecture in which a pretrained 2D CNN (ResNet18) is applied to groups of 3 adjacent slices, followed by 3D convolution:

这是一个示例架构 ,其中将预训练的2D CNN(ResNet18)应用于3个相邻切片的组,然后进行3D卷积:

Image for post
Image by Author
图片作者

间质性肺疾病分类实例 (Interstitial Lung Disease Classification Examples)

The following table includes several example studies focused on interstitial lung disease, organized by clinical goal, data, task, and model.

下表包括按临床目标,数据,任务和模型组织的,针对间质性肺疾病的几个示例研究。

  • Clinical goal: these papers are all focused on interstitial lung disease. The exact classes used differ between studies. Some studies focus on clinical groupings like idiopathic pulmonary fibrosis or idiopathic non-specific interstitial pneumonia (e.g. Wang et al. 2019 and Walsh et al. 2018). Other studies focus on lung patterns like reticulation or honeycombing (e.g. Anthimopoulos et al. 2016 and Gao et al. 2016).

    临床目标:这些论文都集中于间质性肺疾病。 研究之间使用的确切类别有所不同。 一些研究侧重于临床分组,如特发性肺纤维化或特发性非特异性间质性肺炎(例如Wang等人2019和Walsh等人2018)。 其他研究集中在网状或蜂窝状等肺部模式上(例如Anthimopoulos等,2016; Gao等,2016)。
  • Data: the data sets consist of 100–1,200 CTs because all of these studies rely on manual labeling of patches, slices, or pixels, which is very time-consuming. The upside of doing patch, slice, or pixel-level classification is that it provides localization information in addition to diagnostic information.

    数据:数据集包含100–1,200个CT,因为所有这些研究都依赖于手动标记斑块,切片或像素,这非常耗时。 进行补丁,切片或像素级分类的好处是,它除了提供诊断信息外,还提供定位信息。
  • Task: the tasks are mostly multi-class classification, in which each patch or slice is assigned to exactly one class out of multiple possible classes.

    任务:任务主要是多类分类,其中每个补丁或切片都被分配给多个可能类中的一个类。
  • Model: some of the studies use custom CNN architectures, like Wang et al. 2019 and Gao et al. 2018, whereas other studies adapt existing CNN architectures like ResNet and AlexNet.

    模型:有些研究使用了定制的CNN架构,例如Wang等。 2019和Gao等。 2018年,而其他研究调整现有CNN架构像RESNET和AlexNet 。

Image for post

附加阅读 (Additional Reading)

  • For a longer, more in-depth article on this topic, see Automatic Interpretation of Chest CT Scans with Machine Learning

    有关此主题的更长时间,更深入的文章,请参阅使用机器学习对胸部CT扫描进行自动解释

  • For an article about machine learning in chest x-rays, which are 2D medical images of the chest rather than 3D medical images of the chest, see Automated Chest X-Ray Interpretation

    有关胸部X射线是机器的2D医学图像而不是3D胸部医学图像的机器学习文章,请参阅自动胸部X射线解释

  • For more info about CNNs, see Convolutional Neural Networks in 5 minutes and How Computers See: Intro to Convolutional Neural Networks

    有关CNN的更多信息,请参阅5分钟内的卷积神经网络和《计算机的外观:卷积神经网络简介》。

  • For more details about segmentation tasks, see Segmentation: U-Net, Mask R-CNN, and Medical Applications

    有关细分任务的更多详细信息,请参阅细分:U-Net,Mask R-CNN和医疗应用

  • For more details about classification tasks, see Multi-label vs. Multi-class Classification: Sigmoid vs. Softmax

    有关分类任务的更多详细信息,请参阅多标签分类与多分类分类:Sigmoid与Softmax

Originally published at http://glassboxmedicine.com on August 4, 2020.

最初于 2020年8月4日 发布在 http://glassboxmedicine.com 上。

翻译自: https://towardsdatascience.com/chest-ct-scan-machine-learning-in-5-minutes-ae7613192fdc

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389382.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Pytorch高阶API示范——线性回归模型

本文与《20天吃透Pytorch》有所不同,《20天吃透Pytorch》中是继承之前的模型进行拟合,本文是单独建立网络进行拟合。 代码实现: import torch import numpy as np import matplotlib.pyplot as plt import pandas as pd from torch import …

作业要求 20181023-3 每周例行报告

本周要求参见:https://edu.cnblogs.com/campus/nenu/2018fall/homework/2282 1、本周PSP 总计:927min 2、本周进度条 代码行数 博文字数 用到的软件工程知识点 217 757 PSP、版本控制 3、累积进度图 (1)累积代码折线图 &…

算命数据_未来的数据科学家或算命精神向导

算命数据Real Estate Sale Prices, Regression, and Classification: Data Science is the Future of Fortune Telling房地产销售价格,回归和分类:数据科学是算命的未来 As we all know, I am unusually blessed with totally-real psychic abilities.众…

openai-gpt_为什么到处都看到GPT-3?

openai-gptDisclaimer: My opinions are informed by my experience maintaining Cortex, an open source platform for machine learning engineering.免责声明:我的看法是基于我维护 机器学习工程的开源平台 Cortex的 经验而 得出 的。 If you frequent any part…

Pytorch高阶API示范——DNN二分类模型

代码部分: import numpy as np import pandas as pd from matplotlib import pyplot as plt import torch from torch import nn import torch.nn.functional as F from torch.utils.data import Dataset,DataLoader,TensorDataset""" 准备数据 &qu…

OO期末总结

$0 写在前面 善始善终,临近期末,为一学期的收获和努力画一个圆满的句号。 $1 测试与正确性论证的比较 $1-0 什么是测试? 测试是使用人工操作或者程序自动运行的方式来检验它是否满足规定的需求或弄清预期结果与实际结果之间的差别的过程。 它…

数据可视化及其重要性:Python

Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.对于任何试图从数据中提取和传达见…

【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解

👨‍💻博客主页:花无缺 欢迎 点赞👍 收藏⭐ 留言📝 加关注✅! 本文由 花无缺 原创 收录于专栏 【洛谷算法题】 文章目录 【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解🌏题目…

python多项式回归_如何在Python中实现多项式回归模型

python多项式回归Let’s start with an example. We want to predict the Price of a home based on the Area and Age. The function below was used to generate Home Prices and we can pretend this is “real-world data” and our “job” is to create a model which wi…

充分利用UC berkeleys数据科学专业

By Kyra Wong and Kendall Kikkawa黄凯拉(Kyra Wong)和菊川健多 ( Kendall Kikkawa) 什么是“数据科学”? (What is ‘Data Science’?) Data collection, an important aspect of “data science”, is not a new idea. Before the tech boom, every industry al…

02-web框架

1 while True:print(server is waiting...)conn, addr server.accept()data conn.recv(1024) print(data:, data)# 1.得到请求的url路径# ------------dict/obj d["path":"/login"]# d.get(”path“)# 按着http请求协议解析数据# 专注于web业…

ai驱动数据安全治理_AI驱动的Web数据收集解决方案的新起点

ai驱动数据安全治理Data gathering consists of many time-consuming and complex activities. These include proxy management, data parsing, infrastructure management, overcoming fingerprinting anti-measures, rendering JavaScript-heavy websites at scale, and muc…

铁拳nat映射_铁拳如何重塑我的数据可视化设计流程

铁拳nat映射It’s been a full year since I’ve become an independent data visualization designer. When I first started, projects that came to me didn’t relate to my interests or skills. Over the past eight months, it’s become very clear to me that when cl…

DengAI —如何应对数据科学竞赛? (EDA)

了解机器学习 (Understanding ML) This article is based on my entry into DengAI competition on the DrivenData platform. I’ve managed to score within 0.2% (14/9069 as on 02 Jun 2020). Some of the ideas presented here are strictly designed for competitions li…

java.net.SocketException: Software caused connection abort: socket write erro

场景:接口测试 编辑器:eclipse 版本:Version: 2018-09 (4.9.0) testng版本:TestNG version 6.14.0 执行testng.xml时报错信息: 出现此报错原因之一:网上有人说是testng版本与eclipse版本不一致造成的&#…

使用K-Means对美因河畔法兰克福的社区进行聚类

介绍 (Introduction) This blog post summarizes the results of the Capstone Project in the IBM Data Science Specialization on Coursera. Within the project, the districts of Frankfurt am Main in Germany shall be clustered according to their venue data using t…

样本均值的抽样分布_抽样分布样本均值

样本均值的抽样分布One of the most important concepts discussed in the context of inferential data analysis is the idea of sampling distributions. Understanding sampling distributions helps us better comprehend and interpret results from our descriptive as …

玩转ceph性能测试---对象存储(一)

笔者最近在工作中需要测试ceph的rgw,于是边测试边学习。首先工具采用的intel的一个开源工具cosbench,这也是业界主流的对象存储测试工具。 1、cosbench的安装,启动下载最新的cosbench包wget https://github.com/intel-cloud/cosbench/release…

因果关系和相关关系 大数据_数据科学中的相关性与因果关系

因果关系和相关关系 大数据Let’s jump into it right away.让我们马上进入。 相关性 (Correlation) Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For…

vue取数据第一个数据_我作为数据科学家的第一个月

vue取数据第一个数据A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了数据科学家的第一份工作,并且像任何新工作一样,一…