精读 Generating Mammography Reports from Multi-view Mammograms with BERT

精读(非常推荐) Generating Mammography Reports from Multi-view Mammograms with BERT(上)

这里的作者有个叫 Ilya 的吓坏我了

1. Abstract

Writing mammography reports can be errorprone and time-consuming for radiologists. In this paper we propose a method to generate mammography reports given four images, corresponding to the four views used in screening mammography. To the best of our knowledge our work represents the first attempt to generate the mammography report using deep-learning. We propose an encoder-decoder model that includes an EfficientNet-based encoder and a Transformerbased decoder. We demonstrate that the Transformer-based attention mechanism can combine visual and semantic information to localize salient regions on the input mammograms and generate a visually interpretable report. The conducted experiments, including an evaluation by a certified radiologist, show the effectiveness of the proposed method. Our code is available at
代码: https://github.com/sberbank-ai-lab/mammo2text.

在这里插入图片描述

2. Introduction

Breast cancer represents a global healthcare problem (Glo, 2016). Increasing numbers of new cases and deaths are observed in both developed and less developed countries, only partially attributable to the increasing population age. Serial screening with mammography is the most effective method to detect early stage disease and decrease mortality. The goal of screening is to detect breast cancers when still curable to decrease breast cancer-specific mortality (Duffy et al., 2020).
初衷是在可治愈的前提下,减少死亡率

The European Society of Breast Imaging (EUSOBI) together with 30 national breast radiology bodies recommend that only qualified radiologists should be involved in screening programs. (Sardanelli et al., 2017).As the amount of organized breast screening programs grows across the world, the burden on radiologists increases with it. In National screening programs such as in Holland or Sweden, radiologists may need to read 100 radiology images per hour (Abbey et al., 2020). With a growing number of screening programs , we need more trained radiologists and new technologies that can make their workflow more effective. Since one of the most time consuming procedures in radiology is writing medical-imaging reports, we explore the potential for deep-learning to automatically generate diagnostic reports of screening mammograms.
提出由于工作负担导致,智能生成报告的背景
The rapid evolution of deep learning and artificial intelligence technologies enables them to be used as a strong tool for providing clinical decision-making support to the medical community. While many problems in the area of medical imaging and text analysis have been addressed effectively, there is no known approach to generating clinical reports for mammography studies. There are various reasons for this, such as the requirements regarding the accuracy, completeness and diagnostic relevance of the clinical information contained in the report. In this article, we present a framework (Figure1) that takes mammograms as an input, automatically generates mammography reports, and visualizes the attention of the model to provide the interpretability of the process.
We use an encoder-decoder architecture, where the encoder extracts visual features and the decoder generates reports. We adopt a convolutional neural network, specifically EfficientNet (M Tan, 2019), to extract visual features of the four images, corresponding to the four views used in screening mammography.

引文:
Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
EfficientNet (M Tan, 2019) 实际上是一种构建视觉模型网络的范式,😐 为什么要使用这样的视觉模型?如何更好的构建起来一个更好的混合视觉模型,如何组合参数,这里之所以使用这个是不是因为,医学图像并 不同于 自然图像,尤其是钼靶图像这样的有精确化的钙化点,会不会是就是它 重新思考 构建视觉模型结构的原因。

😐 这里实际上,作者解释了对于乳腺钼靶这样的高分辨率图像,使用 EfficientNet B0 可以效率更高!
We use a deep multi-view (N Wu, 2019) CNN based on EfficientNet B0 (M Tan, 2019). We chose EfficientNet B0 because it is relatively lightweight and fits in GPU memory when using high resolution images. We have one EfficientNet instance for all views (R-CC, L-CC, R-MLO, L-MLO), i.e. model weights are shared. The first convolutional layer is replaced to accept a one-channel image. The last fully-connected layer of EfficientNet is discarded. Outputs from all four views are averaged by channels and one fully connected layer is added.


For language modeling, we utilize BERT (Devlin et al., 2018), inserting an additional attention sub-layer to perform multi-head attention over the regional feature embeddings produced by the encoder.

We modify the Transformerbased attention mechanism (Vaswani et al., 2017) such that it attends to the visual information on four mammography views and previously generated words. We use the attention scores to build visually interpretable image-text attention mappings.

In addition to that, we conduct a series of indepth quantitative and qualitative experiments with the help of an experienced radiologist to demonstrate the clinical validity of our approach. We compare the predictions of our models with the ground truth to understand where the models make mistakes and demonstrate that our best model successfully describes different parts of the breast, and detects pathological regions and abnormalities. We evaluate the image-text attention mappings to demonstrate the interpretability of our model. As far as we are aware, our work represents the first attempt to generate the mammography report using deep-learning.

重点看看那个attention map,和视觉模型的比例,从论文上看效果非常好

To summarize, we make the following contributions in this paper:
  1. We propose a novel framework for mammography report generation using EfficientNet in the
    encoder and BERT in the decoder.
  2. We demonstrate that the Transformer-based attention mechanism can combine visual and textual
    information to localize salient regions on the input mammograms and generate a visually interpretable
    report.
  3. We conduct doctor evaluation and extensive experiments with automatic metrics to show the effectiveness of the proposed framework.
  4. We conduct a qualitative analysis including interpretation of image-text attention mappings to demonstrate how the model is able to generate mammography reports in a meaningful way.

3. Related work

The task of image captioning is creating a model that given a previously unseen query image generates a caption that is both grammatically and semantically correct. The main approaches to image captioning are retrieval-based, template-based and novel caption generation.

方法汇总
  1. Retrieval-based, 检索式(Retrieval-based): 这种方法通过在一个预先定义的数据库中搜索最匹配当前图像的描述来工作。数据库中的描述是由人类创建的,针对不同的图像。当给定一个新图像时,系统会尝试找到与之最相似的图像(或图像集),然后将找到的图像的描述作为新图像的描述。这种方法的优点是生成的描述文本质量较高,因为所有的描述都是人类编写的。但是,它的缺点是难以扩展到新的、未见过的图像,而且在数据库中找到精确匹配的图像可能很困难。
  2. Template-based 模板式(Template-based): 这种方法使用预定义的模板来生成描述,模板中包含可变的插槽,这些插槽可以根据图像的内容动态填充。例如,模板可以是“这是一张关于[对象]的照片,在[场景]中”,其中“[对象]”和“[场景]”会根据图像识别的结果填充。模板方法的优点是易于实现和理解,而且生成的文本通常语法正确。然而,它的缺点是生成的描述可能缺乏多样性和创造性,因为所有的描述都是基于固定模板生成的。
  3. Novel caption generation. 新颖描述生成(Novel Caption Generation): 这种方法使用深度学习模型,如卷积神经网络(CNN)和循环神经网络(RNN)或Transformer模型,直接从图像中生成新颖的描述。这种方法不依赖于预定义的模板或数据库,而是通过学习大量图像和其对应描述的数据集,使模型能够学会如何根据图像的内容生成描述。新颖描述生成方法的优点是能够创造出多样化且丰富的描述,而且可以应用于未见过的图像。然而,这种方法的挑战在于需要大量的标注数据来训练模型,且模型的训练计算成本较高。

In retrieval-based methods (Hodosh et al., 2013), (Ordonez et al., 2011) candidate captions for query images are selected from a pool of existing captions based on some measure of similarity. The downside of this approach is the inability to generate novel image-specific captions.

In template-based methods (Farhadi et al., 2010), (Kulkarni et al., 2013), (Li et al., 2011) image captions are generated by filling the blanks in fixed templates. These methods can generate grammatically and semantically correct novel captions not present in the training set but cannot generate variable-length captions.

Novel caption generation methods (Xu et al., 2015), (Yao et al., 2017), (You et al., 2016) use a representation of the query image as an input for a language model responsible for generating the captions. This approach follows the encoder-decoder architecture first applied to machine translation tasks (Cho et al., 2014).

To generate an image caption, a representation of the image must first be constructed either via generating handcrafted features or extracting such features automatically, for example using deep neural networks. Examples of hand-crafted features are local binary patterns (Ojala et al., 2002), scaleinvariant keypoints (Lowe, 2004), or histograms of oriented gradients (Dalal and Triggs, 2005). Automatic feature extraction from images is commonly used by applying convolutional neural networks (CNN) (LeCun et al., 1998) to the query image. These features may be further enhanced, for example by using a spatial Transformer (Pedersoli et al.,2017).

A sub-field of image captioning is diagnostic captioning (DC). Diagnostic captioning is automatic generation of diagnostic text based on a set of medical images of a patient. DC systems can increase the speed of producing a report for experienced physicians and decrease the number of diagnostic errors for inexperienced doctors (for a recent survey on DC methods see (Pavlopoulos et al., 2021)). The majority of the work in DC is done using encoder-decoder architecture. In addition to evaluation of grammatical and semantical correctness of captions, which is commonly assessed by calculating lexical overlap between generated captions and ground truth (Pavlopoulos et al., 2019), DC quality can be assessed by clinical correctness by conducting clinical experiments with physicians evaluating the generated reports (Zhang et al., 2019), (Liu et al., 2019).

Language models commonly used in DC usually apply recurrent neural networks (RNN) such as LSTM (Hochreiter and Schmidhuber, 1997), see (Vinyals et al., 2015) (Xu et al., 2015), with works using Transformer-based models beginning to appear (Chen et al., 2020) . A common approach in DC is the use of ’visual attention’ that allows the decoder to focus on particular areas of input images when generating the captions (Jing et al., 2017), (Yuan et al., 2019). Such mechanisms also can be used to highlight the regions of interest on the input images adding to the interpretability of the models (Zhang et al., 2017).

(Chen et al., 2020)
Zhihong Chen

(Jing et al., 2017)
Baoyu Jing

(Yuan et al., 2019)
Jianbo Yuan,

We split the dataset into the training, validation and test subsets in the proportion of 91%, 4% and 5% respectively (having 22463, 934 and 1229 cases in each subset). The splits are the same for encoder
这里可能一个case对应多个标签,所以,这里的labels并不是总数的和。
在这里插入图片描述
这里数据集划分的很仔细,值得学习,但是具体如何使用,它应该是使用了一种方法,尽量使得种类平衡。

We use a deep multi-view (N Wu, 2019) CNN based on EfficientNet B0 (M Tan, 2019). We chose EfficientNet B0 because it is relatively lightweight and fits in GPU memory when using high resolution images. We have one EfficientNet instance for all views (R-CC, L-CC, R-MLO, L-MLO), i.e. model weights are shared . The first convolutional layer is replaced to accept a one-channel image. The last fully-connected layer of EfficientNet is discarded. Outputs from all four views are averaged by channels and one fully connected layer is added.

模型的设计和我预想大体的一致,但是也有很多不同,提供了一个很好的思路。
  1. 第一层卷积核并没有使用超大卷积核,而是正常的卷积核,同时,输入通道为1,而不是3
  2. 所有的图片经过同一个视觉编码器进行学习,所谓的分享权重
  3. 所有的output通过平均相加,最终得到输出特征
  4. 去掉了最后一层的全连接层

The encoder is pretrained to predict multilabel targets important for diagnosis in mammography screening, shown in Table 1. The binary targets were extracted with regular expressions from text descriptions of the studies. Targets № 0-4 are typical pathological changes in breasts tissues. During training, the images are cropped and resized to 1350x900 px.

这里的数据集处理的比我好,同时视觉模型的大概肯定也比我自己设计的好,但是预训练的过程现在可以使用更多方法。因为现在有了CLIP,GLoRIA这样的模型进行预训练模型结构。

跳过模型的结构,先看模型的测试部分

我觉得这里展示了一个很好的模型测试范式,这里的random很好的说明了模型的效果例子,同时不同于之前看到的BLEU, METEOR, ROUGE-L 这里还告诉了我们可以使用CIDEr模型。
在这里插入图片描述

评估指标:
  1. BLEU (Bilingual Evaluation Understudy):这是机器翻译质量评估中使用最广泛的指标之一。它通过计算机器生成的翻译和一组人工翻译之间n-grams的重叠来评估翻译的质量。BLEU分数越高,意味着生成的翻译和参考翻译之间的重叠越多,通常认为翻译质量越好。

  2. METEOR (Metric for Evaluation of Translation with Explicit Ordering):它是对BLEU的改进,不仅考虑了单词的精确匹配,还考虑了词形、同义词和词序的匹配。METEOR也会对匹配的单词进行加权,给予不同类型的匹配(如词干匹配或同义词匹配)不同的重要性。

  3. ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence):ROUGE主要用于评估自动文本摘要或机器翻译的质量。ROUGE-L的“L”代表最长公共子序列(LCS),它考虑了候选文本和参考文本之间的最长公共子序列。这个度量考虑了候选文本和参考文本中的词序,使用最长公共子序列来评估它们之间的相似度。

  4. CIDEr (Consensus-based Image Description Evaluation):专为评估图像描述任务设计的度量,通过计算候选描述和参考描述集中n-grams的相似性来衡量描述的质量。CIDEr特别强调词汇的独特性,通过TF-IDF统计来增加稀有词汇的权重,以鼓励生成的描述能够反映出图片的特定和独特内容。

在这里插入图片描述

这里更甚使用了具体到病情的评估指标,这种好的思路真是太好了,太值得学习了

在这里插入图片描述

这样的展示图片真的太完美了,太值得学习了,俄罗斯的人工智能搞的是真好

模型的结构在(下)解析,这里跳到结论部分

In this paper we present a first-of-its-kind framework for generating mammography reports given four mammography views using deep-learning. Our model utilizes pretrained models including EfficientNet for visual extraction and BERT for report generation. We demostrate that the Transformerbased attention mechanism that simultaneously attends to four mammography views and text from the report significantly improves the performance. Our method provides a novel perspective for breast screening: generating mammography reports and providing image-text attention mappings, which makes the automatic breast screening process semantically and visually interpretable. The validity of our approach is confirmed by the corresponding doctor evaluation. In the conducted qualitative analysis we demonstrate that our best model successfully detects pathological regions, and describes abnormalities and parts of the breast.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/792347.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

基于单片机的数字万用表设计

**单片机设计介绍,基于单片机的数字万用表设计 文章目录 一 概要二、功能设计设计思路 三、 软件设计原理图 五、 程序六、 文章目录 一 概要 基于单片机的数字万用表设计概要是关于使用单片机技术来实现数字万用表功能的一种设计方案。下面将详细概述该设计的各个…

【性能测试】接口测试各知识第2篇:学习目标,1. 理解接口的概念【附代码文档】

接口测试完整教程(附代码资料)主要内容讲述:接口测试,学习目标学习目标,2. 接口测试课程大纲,3. 接口学完样品,4. 学完课程,学到什么,5. 参考:,1. 理解接口的概念。学习目标,RESTFUL1. 理解接口的概念,2.什么是接口测试…

ChatGPT 的行家指南

原文:An Insider’s Guide to using ChatGPT 译者:飞龙 协议:CC BY-NC-SA 4.0 介绍 你是否厌倦了花费无数小时为你的业务创建内容?从博客文章到社交媒体更新,从电子书内容到电子邮件,这可能是一个耗时的过…

如何保持数据一致性

如何保持数据一致性 数据库和缓存(比如:redis)双写数据一致性问题,是一个跟开发语言无关的公共问题。尤其在高并发的场景下,这个问题变得更加严重。 问题描述: 1.在高并发的场景中,针对同一个…

基于java+SpringBoot+Vue的学生心理咨询评估系统设计与实现

基于javaSpringBootVue的学生心理咨询评估系统设计与实现 开发语言: Java 数据库: MySQL技术: Spring Boot MyBatis工具: IDEA/Eclipse、Navicat、Maven 系统展示 后台展示 用户管理模块:管理员可以查看、添加、编辑和删除用户信息。 试题管理模块&#xff1a…

Qt + VS2017 创建一个简单的图片加载应用程序

简介: 本文介绍了如何使用Qt创建一个简单的图片加载应用程序。该应用程序可以打开图片文件并在界面上显示选定的图片,并保存用户上次选择的图片路径。 1. 创建项目: 首先,在VS中创建一个新的Qt Widgets应用程序项目,并…

LeetCode 1379.找出克隆二叉树中的相同节点:二叉树遍历

【LetMeFly】1379.找出克隆二叉树中的相同节点:二叉树遍历 力扣题目链接:https://leetcode.cn/problems/find-a-corresponding-node-of-a-binary-tree-in-a-clone-of-that-tree/ 给你两棵二叉树,原始树 original 和克隆树 cloned&#xff0…

Golang | Leetcode Golang题解之第3题无重复字符的最长子串

题目: 题解: func lengthOfLongestSubstring(s string) int {// 哈希集合,记录每个字符是否出现过m : map[byte]int{}n : len(s)// 右指针,初始值为 -1,相当于我们在字符串的左边界的左侧,还没有开始移动r…

CMD 命令行进入到电脑硬盘的某个目录的几种方式

本文介绍几种 cmd 命令行进入到电脑硬盘的某个目录的几种方式。 1、在具体文件目录地址栏输入 cmd 回车 这是最快的、最牛的方式,没有之一。 比如:我想进入一个层级很深的文件目录,直接打开在那个目录,把地址栏信息删除清空&am…

FLink学习(三)-DataStream

一、DataStream 1&#xff0c;支持序列化的类型有 基本类型&#xff0c;即 String、Long、Integer、Boolean、Array复合类型&#xff1a;Tuples、POJOs 和 Scala case classes Tuples Flink 自带有 Tuple0 到 Tuple25 类型 Tuple2<String, Integer> person Tuple2.…

【C++进阶】带你手撕红黑树(红与黑的爱恨厮杀)

&#x1fa90;&#x1fa90;&#x1fa90;欢迎来到程序员餐厅&#x1f4ab;&#x1f4ab;&#x1f4ab; 主厨&#xff1a;邪王真眼 主厨的主页&#xff1a;Chef‘s blog 所属专栏&#xff1a;c大冒险 总有光环在陨落&#xff0c;总有新星在闪烁 引言&#xff1a; 之前我们…

vivado 串行矢量格式 (SVF) 文件编程

串行矢量格式 (SVF) 文件编程 注释 &#xff1a; 串行矢量格式 (SVF) 编程在 Versal ™ 器件上不受支持。 对 FPGA 和配置存储器器件进行编程的另一种方法是通过使用串行矢量格式 (SVF) 文件来执行编程。通过 Vivado Design Suite 和 Vivado Lab Edition 生成的 SVF …

八数码(bfs做法)非常详细,适合新手服用

题目描述&#xff1a; 在一个 33 的网格中&#xff0c;1∼8这 8 个数字和一个 x 恰好不重不漏地分布在这 33 的网格中。 例如&#xff1a; 1 2 3 x 4 6 7 5 8在游戏过程中&#xff0c;可以把 x 与其上、下、左、右四个方向之一的数字交换&#xff08;如果存在&#xff09;。 我…

JS-11A/11时间继电器 板前接线 JOSEF约瑟

系列型号&#xff1a; JS-11A/11集成电路时间继电器&#xff1b;JS-11A/12集成电路时间继电器&#xff1b; JS-11A/13集成电路时间继电器&#xff1b;JS-11A/136集成电路时间继电器&#xff1b; JS-11A/137集成电路时间继电器&#xff1b;JS-11A/22集成电路时间继电器&#…

Folder Icons for Mac v1.8 激活版文件夹个性化图标修改软件

Folder Icons for Mac是一款Mac OS平台上的文件夹图标修改软件&#xff0c;同时也是一款非常有意思的系统美化软件。这款软件的主要功能是可以将Mac的默认文件夹图标更改为非常漂亮有趣的个性化图标。 软件下载&#xff1a;Folder Icons for Mac v1.8 激活版 以下是这款软件的一…

亚马逊自动养号软件新手必读:养号过程中的关键注意事项

亚马逊买家号想要养号效果好&#xff0c;需要重点注意以下4点: 1、前期的准备工作:确保账号登陆的环境是安全的&#xff0c;最好就是用的家庭IP。然后是FBA收货 地址&#xff0c;固定的收货地址有利于账号的安全稳定性&#xff0c;还有一个就是确定安全的成交付款方式&#xf…

网络协议栈--数据链路层

目录 对比理解“数据链路层”和“网络层”一、认识以太网1.1 以太网帧格式1.2 认识MAC地址1.3 对比理解MAC地址和IP地址1.4 认识MTU1.5 MTU对IP协议的影响1.6 MTU对UDP协议的影响1.7 MTU对于TCP协议的影响1.8 查看硬件地址和MTU 二、ARP协议2.1 ARP协议的作用2.2 ARP协议的工作…

Springboot传参要求

传参的参数名称必须与Set方法的参数名字相同 &#xff0c;不然会报错。

MacBook安装使用XMind

MacBook安装使用XMind XMind简介 官方地址: https://www.xmind.cn/ XMind 是一个全功能的思维导图和头脑风暴软件,为激发灵感和创意而生。作为一款有效提升工作和生活效率的生产力工具,受到全球百千万用户的青睐。 XMind 是一款非常实用的商业思维导图软件&#xff0c;应用…

我认识的Git-史上最强的版本控制系统

大家好&#xff01; 欢迎大家来一起交流Git使用心得&#xff0c;相信很多同事对Git都很熟悉了&#xff0c;如果下面说的有错误的“知识点”&#xff0c;欢迎批评指正。 初识Git 我认识Git已经很多年了&#xff08;我在有道云笔记里面“Git”文件夹的创建时间是&#xff1a; …