Index
Multimodal Extraction of Lot-Level Metadata from Auction Catalogues using OCR and Vision Language Models
MasterThesis_AlishaMundRobust Image Registration Algorithms for Reference-Based X-Ray Defect Detection in Non-Destructive Testing
In our 2D X-ray non-destructive testing (NDT) pipeline, we use two inspection
strategies: (a) reference-less inspection, which works well on simple parts; and
(b) reference-based inspection, which is used for complex, large parts where
reference-less methods result in false positives and missed defects. Reference-
based detection fundamentally relies on a high-quality ’golden’ image that must
be precisely aligned with the test image for accurate defect detection. However,
current moment-based registration algorithms perform poorly when confronted
with practical imaging variations, including translation, rotation, and slight non-
rigid deformations. Slight changes in perspective (common in X-ray setups due
to varying source-detector distances) are not handled well, resulting in residual
misalignment. These registration failures directly result in critical defects being
missed or false positives being detected.
This thesis will identify and evaluate registration approaches that can handle
rigid transformations and slight scale differences while preserving small defect
artefacts.
Topology-Aware Edge-Map Enhancement of Scanning Electron Microscope Images
Unsupervised Learning for Detection of Rare Driving Scenarios
Curriculum Learning for Medical Vision-Language Models
This master thesis investigates how curriculum learning strategies can improve vision-language alignment of medical vision-language models. Instead of training on all samples uniformly, the thesis explores curricula that organize training data from easy to hard, coarse to fine, or generic to clinically complex cases. The goal is to design and evaluate different curriculum strategies for medical tasks such as radiology report generation or medical visual question answering.
Tasks:
- Dataset preparation
- VLM finetuning
- Comprehensive evaluation
Requirements:
- Experience with PyTorch and training models
- Experience with vision or language models
- (Optional) Experience using SLURM
- (Recommended) Deep Learing / Pattern Recognition Lecture
Application: (Applications that do not follow the application requirements will not be considered)
Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application CurriculumVLM + your_full_name” to Lukas.Buess@fau.de
Start Date: 15.01.2026 or later
Relevant Literature:
[1] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
[2] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.
[3] Hamamci, I. E., Er, S., Wang, C., Almas, F., Simsek, A. G., Esirgun, S. N., … & Menze, B. (2024). Developing generalist foundation models from a multimodal dataset for 3d computed tomography. arXiv preprint arXiv:2403.17834.
[4] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.
[5] Xu, J., Zhang, X., Abderezaei, J., Bauml, J., Boodoo, R., Haghighi, F., … & Delbrouck, J. B. (2025, November). RadEval: A framework for radiology text evaluation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 546-557).
[6] Liu, F., Ge, S., & Wu, X. (2021, August). Competence-based multimodal curriculum learning for medical report generation. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers) (pp. 3001-3012).
[7] Holland, R., Taylor, T. R., Holmes, C., Riedl, S., Mai, J., Patsiamanidi, M., … & PINNACLE consortium Prevost Toby 3 On behalf of the PINNACLE consortium Fritsche Lars 12 On behalf of the PINNACLE consortium Pfau Kristina 7 On behalf of the PINNACLE consortium Pfau Maximilian 8 13 On behalf of the PINNACLE consortium. (2025). Specialized curricula for training vision language models in retinal image analysis. NPJ Digital Medicine, 8(1), 532.
Kinematic Calibration and Reachability-Based Planning for Precision Robotic Needle Insertion in Liver Ablation
Analyzing contrast agent inhomogeneities in the left atrial appendage
