Index
Multimodal Extraction of Lot-Level Metadata from Auction Catalogues using OCR and Vision Language Models
MasterThesis_AlishaMundTopology-Aware Edge-Map Enhancement of Scanning Electron Microscope Images
Unsupervised Learning for Detection of Rare Driving Scenarios
Curriculum Learning for Medical Vision-Language Models
This master thesis investigates how curriculum learning strategies can improve vision-language alignment of medical vision-language models. Instead of training on all samples uniformly, the thesis explores curricula that organize training data from easy to hard, coarse to fine, or generic to clinically complex cases. The goal is to design and evaluate different curriculum strategies for medical tasks such as radiology report generation or medical visual question answering.
Tasks:
- Dataset preparation
- VLM finetuning
- Comprehensive evaluation
Requirements:
- Experience with PyTorch and training models
- Experience with vision or language models
- (Optional) Experience using SLURM
- (Recommended) Deep Learing / Pattern Recognition Lecture
Application: (Applications that do not follow the application requirements will not be considered)
Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application CurriculumVLM + your_full_name” to Lukas.Buess@fau.de
Start Date: 15.01.2026 or later
Relevant Literature:
[1] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
[2] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.
[3] Hamamci, I. E., Er, S., Wang, C., Almas, F., Simsek, A. G., Esirgun, S. N., … & Menze, B. (2024). Developing generalist foundation models from a multimodal dataset for 3d computed tomography. arXiv preprint arXiv:2403.17834.
[4] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.
[5] Xu, J., Zhang, X., Abderezaei, J., Bauml, J., Boodoo, R., Haghighi, F., … & Delbrouck, J. B. (2025, November). RadEval: A framework for radiology text evaluation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 546-557).
[6] Liu, F., Ge, S., & Wu, X. (2021, August). Competence-based multimodal curriculum learning for medical report generation. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers) (pp. 3001-3012).
[7] Holland, R., Taylor, T. R., Holmes, C., Riedl, S., Mai, J., Patsiamanidi, M., … & PINNACLE consortium Prevost Toby 3 On behalf of the PINNACLE consortium Fritsche Lars 12 On behalf of the PINNACLE consortium Pfau Kristina 7 On behalf of the PINNACLE consortium Pfau Maximilian 8 13 On behalf of the PINNACLE consortium. (2025). Specialized curricula for training vision language models in retinal image analysis. NPJ Digital Medicine, 8(1), 532.
Kinematic Calibration and Reachability-Based Planning for Precision Robotic Needle Insertion in Liver Ablation
Analyzing contrast agent inhomogeneities in the left atrial appendage
