
Title 题目 A 3D Probabilistic Deep Learning System forDetection and Diagnosis of Lung Cancer Using Low-Dose CT Scans 利用低剂量CT扫描的三维概率深度学习系统用于肺癌的检测与诊 01文献速递介绍 肺癌既是最常见的癌症之一,也是导致癌症死亡的主要原因之…



A 3D Probabilistic Deep Learning System forDetection and Diagnosis of Lung Cancer Using Low-Dose CT Scans








We introduce a new computer aided detec tion and diagnosis system for lung cancer screening with low-dose CT scans that produces meaningful probability assessments. Our system is based entirely on 3D convo lutional neural networks and achieves state-of-the-art per formance for both lung nodule detection and malignancy classification tasks on the publicly available LUNA16 and Kaggle Data Science Bowl challenges. While nodule detec tion systems are typically designed and optimized on their own, we find that it is important to consider the coupling between detection and diagnosis components. Exploiting this coupling allows us to develop an end-to-end system that has higher and more robust performance and eliminates the need for a nodule detection false positive reduction stage. Furthermore, we characterize model uncertainty in our deep learning systems, a first for lung CT analysis,and show that we can use this to provide well-calibrated classification probabilities for both nodule detection and patient malignancy diagnosis. These calibrated probabilities informed by model uncertainty can be used for subsequent risk-based decision making towards diagnostic interventions or disease treatments, as we demonstrateusing a probability-based patient referral strategy to furtherimprove our results.




In this paper, we introduced a full CADe/CADx system to detect and diagnose lung cancer using low-dose CT scans. Our system uses a cascade of 3D CNNs and achieves state-of-the art performance on both lung nodule detection and malignancy classification problems on the publicly available LUNA16 and Kaggle datasets. Moreover, we characterized model uncer tainty using Monte Carlo dropout and deep ensembles, and showed that quantification of model uncertainty enables our system to provide calibrated classification probabilities, which makes it reliable for subsequent utility/risk-based decision making towards diagnostic interventions or disease treatments. We demonstrated that we can further improve the performance by using these calibrated probabilities to make patient referral decisions. Our CADe/CADx system studies demonstrate that CADe and CADx modules should be developed and studied jointly if the goal is to use them as an end-to-end automated diagnosticS





We evaluate our CADe system on the LUNA16 benchmark and our CADx system on the Kaggle Stage-2 test set. Since CADx directly relies on CADe, the success of the CADx system acts as additional validation of the CADe system and its ability to generalize to an independent dataset. Likewise, the CADx system is trained and validated on the Kaggle Stage- 1 dataset but tested on the Kaggle Stage-2 dataset, which is more recent and has different image quality




Fig. 1.  Our overall CAD system diagram. Since CADx performance is so reliant on the quality of the nodule candidates generated by the CADe, both were developed simultaneously to achieve the best overallperformance.



Fig. 2.  Randomly sampled augmentations of a single nodule demon strating our extensive augmentation transforms used during model training.



Fig. 3. Sample nodule segmentations from our CADe segmentation model, sliced through the center of each nodule candidate. First row: Input CT scan images from LIDC-IDRI test data. Second row: Our cor responding segmentation probabilities. Third row: (Spherical) voxelwise labels extracted from the LUNA16 annotations.



Fig. 4. Base neural network architecture used for nodule candidate scor ing, malignancy ranking, and multiple-instance malignancy classification. The architecture hyperparameters were found through experimentation on the two CADx tasks, namely malignancy ranking and classification.

图4. 用于结节候选区评分、恶性排名和多实例恶性分类的基本神经网络架构。通过对两个CADx任务进行实验(即恶性排名和分类),找到了架构的超参数。


Fig. 5. Example nodule candidates with CADx malignancy probabilities along with corresponding candidate attention weights. Each row rep resents candidates from a specific patient. The scores on top of each candidate are the corresponding CADx network attention weights, which sum up to 1 and represent how much each candidate contributes rela tively to the final CADx score. Estimated patient-level CADx malignancy probability is given in the bottom of each row.

图5. 示范性结节候选区,带有CADx恶性概率以及对应的候选区注意权重。每一行代表来自特定患者的候选区。每个候选区顶部的分数是相应的CADx网络注意权重,它们总和为1,表示每个候选区相对于最终CADx得分的贡献程度。每行底部给出了估计的患者级CADx恶性概率。


Fig. 6. The free response operating characteristic (FROC) for our CADecandidate generation and scoring system on the LUNA16 dataset, withpatient-bootstrapped 95% confidence interval, and the same results withour comparison model architecture (3D U-Net.)

图6. 我们的CADe候选区生成和评分系统在LUNA16数据集上的自由响应操作特性(FROC),带有患者自助95%置信区间,以及相同结果与我们的比较模型架构(3D U-Net)的比较。


Fig. 7. CADe FROC on LUNA16 data, breaking down the sensitivityby nodule diameter. CADe sensitivity for the smallest group of nodules(between 3 mm and 5 mm diameter) is significantly worse than the sensitivity for larger nodules (between 5 mm and 30 mm diameter) at lower thresholds that correspond to reduced false positives.



Fig. 8. The receiver operating characteristic (ROC) curve for our CADx system on the Kaggle Stage-2 test set, trained on both LUNA16 and Kaggle Stage-1 data, with patient-bootstrapped 95% confidence interval.

图8. 我们的CADx系统在Kaggle第二阶段测试集上的接收者操作特性(ROC)曲线,该系统在LUNA16和Kaggle第一阶段数据上进行了训练,带有患者自助95%置信区间。


Fig. 9. Probability calibration curves for the Bayesian approximationand non-Bayesian (standard) variants of the CADe candidate scoringneural networks, with patient-bootstrapped 95% confidence intervals.The estimated nodule probabilities output by the CADe scoring networkare well-calibrated when model uncertainty is included.



Fig. 10. Probability calibration curves for the Bayesian ensemble and the best non-Bayesian CADx model, with patient-bootstrapped 95% con- fidence intervals. The estimated malignancy probabilities output by the CADx Bayesian ensemble are well-calibrated when model uncertainty isincluded.



Fig. 11. The CADe scoring area under the precision-recall curve as a function of referral percentage, for the entropy referral strategy and a random strategy (with 95% confidence interval).



Fig. 12. The CADx area under the ROC curve as a function of referral percentage, for entropy referral strategy and a random strategy with 95% confidence interval. Note that the confidence interval is wider than the CADe scoring network because the number of patients in the Kaggle test set is smaller than the number of nodules in the LUNA16 dataset.

图12. CADx面积下的ROC曲线作为转诊百分比的函数,对于熵转诊策略和随机策略,带有95%置信区间。请注意,置信区间比CADe评分网络更宽,因为Kaggle测试集中的患者数量比LUNA16数据集中的结节数量少


Fig. 13. Histogram of CADx malignancy probability estimates for benign and malignant patients. Note that the mode around 0 for malignant patients is the reason behind false negatives due to missed nodules by the CADe model.

图13. CADx恶性概率估计的直方图,对于良性和恶性患者。请注意,恶性患者概率估计周围的0模式是由于CADe模型漏掉结节而导致假阴性的原因。



TABLE I cade and CADx resuts by cade threshold false positive rate

表格 I 根据CADe阈值假阳性率的CADE和CADx结果


TABLE II CADX results CADE threshold for training and testing

表格 II 根据CADe阈值进行的CADx训练和测试结果




