浅读 Natural Language Generation Model for Mammography Reports Simulation
这是一篇报告生成 去伪 的文章,重点看生成报告的 真实性
Abstract
Extending the size of labeled corpora of medical reports is a major step towards a successful training of machine learning algorithms. Simulating new text reports is a key solution for reports augmentation, which extends the cohort size. However, text generation in the medical domain is challenging because it needs to preserve both content and style that are typical for real reports, without risking the patients’ privacy. In this paper, we present a conditioned LSTM-RNN architecture for simulating realistic mammography reports. We evaluated the performance by analyzing the characteristics of the simulated reports and classifying them into benign and malignant classes. An average classification AUC was calculated over two distinct test sets. A qualitative analysis was also performed in which a masked radiologist classified 0.75 of the simulated reports as real reports, showing that both the style and content of the simulated reports were similar to real reports. Finally, we compared our RNN-LSTM generative
model with Markov Random Fields. The RNN-LSTM provided significantly better and more stable performance than MRFs (p < 0.01, Wilcoxon).
我主要觉得这篇文章的评估有些价值
由于自回归模型的训练模式,导致很容易导致模式一样,但是对于病情的掌握并不准确,所以,这个 words in sentence, words in a report 这样的评估指标就显得非常实际,非常有价值。
- 指标
-
- words in sentence
-
- words in a report