Vision-Language Models in Radiology

Type: MA thesis

Status: open

Supervisors: Lukas Buess, Andreas Maier

Tasks:
  1. Extend existing dataset with synthetically generated data
  2. Train multimodal vision-language model
  3. Perform extensive evaluation of the model on public datasets:
    • Investigate and apply suitable evaluation metrics.
    • Research state-of-the-art methods for comparison.
  4. (Optional: Contribute to writing a research paper based on the results.)
Requirements:
  1. Experience with PyTorch.
  2. Experience with training deep learning models.
  3. Ability to attend in-person meetings.
Application (Applications not following these requirements will not be considered):
  1. Curriculum Vitae (CV).
  2. Short motivation letter (max. one page).
  3. Transcript of records.

Send your application with the subject “Application VLM Radiology + your full name” to Lukas.Buess@fau.de.

Starting Date:
15.09.2025 or later
References:
[1]  I. E. Hamamci u. a., „Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography“, 16. Oktober 2024, arXiv: arXiv:2403.17834. doi: 10.48550/arXiv.2403.17834.
[2] Pellegrini, C., Özsoy, E., Busam, B., Wiestler, B., Navab, N., & Keicher, M. (2025). Radialog: Large vision-language models for x-ray reporting and dialog-driven assistance. In Medical Imaging with Deep Learning.
[3] S. Ostmeier u. a., „GREEN: Generative Radiology Report Evaluation and Error Notation“, in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, S. 374–390. doi: 10.18653/v1/2024.findings-emnlp.21.