Reinforcement Learning from AI Feedback for Radiology Vision–Language Models

Type: MA thesis

Status: open

Supervisors: Lukas Buess, Andreas Maier

This master thesis investigates how reinforcement learning from AI feedback (RLAIF) can improve radiology report generation with vision–language models. The goal is to train and compare different reward models to align generated reports with human and clinical preferences.

Tasks:

  • Dataset preparation
  • Supervised finetuning of vision-language models
  • Human preference alignemnt using Reinforcement Learning
  • Comprehensive evaluation

Requirements:

  • Experience with PyTorch and training models
  • Experience with vision or language models
  • (Optional) Experience using SLURM
  • (Recommended) Deep Learing / Pattern Recognition Lecture

Application: (Applications that do not follow the application requirements will not be considered)
Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application RLAIF-VLM + your_full_name” to Lukas.Buess@fau.de

Start Date: 15.01.2026 or later

Relevant Literature:
[1] Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. Advances in neural information processing systems, 36, 34892-34916.
[2] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
[3] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.
[4] Hamamci, I. E., Er, S., Wang, C., Almas, F., Simsek, A. G., Esirgun, S. N., … & Menze, B. (2024). Developing generalist foundation models from a multimodal dataset for 3d computed tomography. arXiv preprint arXiv:2403.17834.
[5] Blankemeier, L., Cohen, J. P., Kumar, A., Van Veen, D., Gardezi, S. J. S., Paschali, M., … & Chaudhari, A. S. (2024). Merlin: A vision language foundation model for 3d computed tomography. Research Square, rs-3.
[6] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.
[7] Hein, D., Chen, Z., Ostmeier, S., Xu, J., Varma, M., Reis, E. P., … & Chaudhari, A. S. (2025, July). CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 27679-27702).