This master thesis explores the training and evaluation of a vision-language model for radiology report generation using large-scale medical datasets. We aim to investigate how different clinical settings influence the quality of generated reports, with a focus on enhancing the evaluation of the generated reports.
- Dataset preparation
- Finetune vision-language models
- Comprehensive evaluation
- Experience with PyTorch and training models
- Experience with vision or language models
- (Optional) Experience using SLURM
- (Recommended) Deep Learing / Pattern Recognition Lecture
Application: (Applications that do not follow the application requirements will not be considered)
Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application CXR-Report + your_full_name” to Lukas.Buess@fau.de
Start Date: 01.06.2025 or later
Relevant Literature:
[1] Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. Advances in neural information processing systems, 36, 34892-34916.
[2] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.
[3] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
[4] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.