Tasks:
- Extend existing dataset with synthetically generated data
- Train multimodal vision-language model
- Perform extensive evaluation of the model on public datasets:
- Investigate and apply suitable evaluation metrics.
- Research state-of-the-art methods for comparison.
- (Optional: Contribute to writing a research paper based on the results.)
Requirements:
- Experience with PyTorch.
- Experience with training deep learning models.
- Ability to attend in-person meetings.
Application (Applications not following these requirements will not be considered):
- Curriculum Vitae (CV).
- Short motivation letter (max. one page).
- Transcript of records.
Send your application with the subject “Application VLM Radiology + your full name” to Lukas.Buess@fau.de.
Starting Date:
15.09.2025 or later
References:
[1] I. E. Hamamci u. a., „Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography“, 16. Oktober 2024, arXiv: arXiv:2403.17834. doi: 10.48550/arXiv.2403.17834.
[2] Pellegrini, C., Özsoy, E., Busam, B., Wiestler, B., Navab, N., & Keicher, M. (2025). Radialog: Large vision-language models for x-ray reporting and dialog-driven assistance. In Medical Imaging with Deep Learning.
[3] S. Ostmeier u. a., „GREEN: Generative Radiology Report Evaluation and Error Notation“, in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, S. 374–390. doi: 10.18653/v1/2024.findings-emnlp.21.