GAN-based Synthetic Chest X-ray Generation for Training Lung Disease Classification Systems

Type: MA thesis

Status: finished

Date: December 1, 2021 - June 1, 2022

Supervisors: Kai Packhäuser, Florian Thamm, Andreas Maier

Project description

With the rise and ever-growing potential of Deep Learning (DL) techniques in recent years, completely new opportunities have emerged in the field of medical image processing, in particular in fundamental application areas such as image detection and recognition, image segmentation, image registration, and computer-aided diagnosis. However, DL techniques are known to require very large amounts of data to train the neural networks (NN), which can sometimes be a problem due to limited data availability. In recent years, the public release of medical image data has increased and has led to significant advances in the scientific community. For instance, publicly availabe large-scale chest X-ray datasets enabled the development of novel systems for automated lung abnormality classification [1, 2]. In recent work, however, it has been shown that DL techniques can also be used maliciously, e. g., for linkage attacks on public chest X-ray datasets [3]. This constitutes a tremendous issue in terms of data security and patient privacy, as a potential attacker may leak available information (e. g. age, gender, diseases, and more) about a specific patient present in a public dataset. To alleviate privacy concerns, the question now arises whether the exlusive use of synthetically generated images can represent a serious alternative for the development of diagnostic algorithms in the medical field.

In this work, we investigate whether synthetically generated chest X-ray images can be used to train a reliable classification system for lung diseases. Therefore, we will use different approaches, e. g., [4–6], to synthesize realistic looking chest X-ray scans from a real data distribution. In doing so, we will focus on ensuring that characteristic disease patterns will be preserved in the generated images. For our experiments, we will employ the NIH ChestX-ray14 [7] dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients with the text-mined fourteen disease image labels.

The Master’s thesis covers the following aspects:

  1. Overview of the current state-of-the-art in DL for the generation of synthetic medical image data.
  2. Building one or multiple GAN-based image generation networks which includes:
    • Hyper-parameter tuning
    • Analyzing the performance of the networks
    • Analyzing the realism of the generated images
  3. Evaluating the feasibility of using synthetically generated chest X-ray images for training a lung disease classification system.
  4. Outlining strategies and research directions to enhance the preservation of patient privacy in public datasets (optional).

All DL implementations will be implemented using PyTorch [8].



[1] S. Gündel, S. Grbic, B. Georgescu, S. Liu, A. Maier, and D. Comaniciu, “Learning to Recognize Abnormalities in Chest X-Rays with Location-Aware Dense Networks,” in Iberoamerican Congress on Pattern Recognition, pp. 757–765, Springer, 2018.

[2] S. Gündel, A. A. Setio, F. C. Ghesu, S. Grbic, B. Georgescu, A. Maier, and D. Comaniciu, “Robust classification from noisy labels: Integrating additional knowledge for chest radiography abnormality assessment,” Medical Image Analysis, vol. 72, p. 102087, 2021.

[3] K. Packhäuser, S. Gündel, N. Münster, C. Syben, V. Christlein, and A. Maier, “Is Medical Chest X-ray Data Anonymous?,” arXiv preprint arXiv:2103.08562, 2021.

[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.

[5] M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,” arXiv preprint arXiv:1411.1784, 2014.

[6] A. Odena, C. Olah, and J. Shlens, “Conditional Image Synthesis with Auxiliary Classifier GANs,” in International Conference on Machine Learning, pp. 2642–2651, PMLR, 2017.

[7] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “ChestX-ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[8] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” Advances in Neural Information Processing Systems, vol. 32, pp. 8026–8037, 2019.