Index
Multimodal Extraction of Lot-Level Metadata from Auction Catalogues using OCR and Vision Language Models
MasterThesis_AlishaMundRobust Image Registration Algorithms for Reference-Based X-Ray Defect Detection in Non-Destructive Testing
In our 2D X-ray non-destructive testing (NDT) pipeline, we use two inspection
strategies: (a) reference-less inspection, which works well on simple parts; and
(b) reference-based inspection, which is used for complex, large parts where
reference-less methods result in false positives and missed defects. Reference-
based detection fundamentally relies on a high-quality ’golden’ image that must
be precisely aligned with the test image for accurate defect detection. However,
current moment-based registration algorithms perform poorly when confronted
with practical imaging variations, including translation, rotation, and slight non-
rigid deformations. Slight changes in perspective (common in X-ray setups due
to varying source-detector distances) are not handled well, resulting in residual
misalignment. These registration failures directly result in critical defects being
missed or false positives being detected.
This thesis will identify and evaluate registration approaches that can handle
rigid transformations and slight scale differences while preserving small defect
artefacts.
Topology-Aware Edge-Map Enhancement of Scanning Electron Microscope Images
Unsupervised Learning for Detection of Rare Driving Scenarios
Learning Human-Aligned Evaluation Metrics for Radiology Reports
This master thesis focuses on developing a human-aligned evaluation metric for radiology report generation. Using large-scale medical datasets, you will train and analyze language models to assess report quality in a way that better reflects human and clinical preferences.
Tasks:
- Dataset preparation
- LLM finetuning
- Comprehensive evaluation
Requirements:
- Experience with PyTorch and training models
- Experience with vision or language models
- (Optional) Experience using SLURM
- (Recommended) Deep Learing / Pattern Recognition Lecture
Application: (Applications that do not follow the application requirements will not be considered)
Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application ReportMetric + your_full_name” to Lukas.Buess@fau.de
Start Date: 15.01.2026 or later
Relevant Literature:
[1] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
[2] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.
[3] Hamamci, I. E., Er, S., Wang, C., Almas, F., Simsek, A. G., Esirgun, S. N., … & Menze, B. (2024). Developing generalist foundation models from a multimodal dataset for 3d computed tomography. arXiv preprint arXiv:2403.17834.
[4] Blankemeier, L., Cohen, J. P., Kumar, A., Van Veen, D., Gardezi, S. J. S., Paschali, M., … & Chaudhari, A. S. (2024). Merlin: A vision language foundation model for 3d computed tomography. Research Square, rs-3.
[5] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.
[6] Xu, J., Zhang, X., Abderezaei, J., Bauml, J., Boodoo, R., Haghighi, F., … & Delbrouck, J. B. (2025, November). RadEval: A framework for radiology text evaluation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 546-557).
Reinforcement Learning from AI Feedback for Radiology Vision–Language Models
This master thesis investigates how reinforcement learning from AI feedback (RLAIF) can improve radiology report generation with vision–language models. The goal is to train and compare different reward models to align generated reports with human and clinical preferences.
Tasks:
- Dataset preparation
- Supervised finetuning of vision-language models
- Human preference alignemnt using Reinforcement Learning
- Comprehensive evaluation
Requirements:
- Experience with PyTorch and training models
- Experience with vision or language models
- (Optional) Experience using SLURM
- (Recommended) Deep Learing / Pattern Recognition Lecture
Application: (Applications that do not follow the application requirements will not be considered)
Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application RLAIF-VLM + your_full_name” to Lukas.Buess@fau.de
Start Date: 15.01.2026 or later
Relevant Literature:
[1] Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. Advances in neural information processing systems, 36, 34892-34916.
[2] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
[3] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.
[4] Hamamci, I. E., Er, S., Wang, C., Almas, F., Simsek, A. G., Esirgun, S. N., … & Menze, B. (2024). Developing generalist foundation models from a multimodal dataset for 3d computed tomography. arXiv preprint arXiv:2403.17834.
[5] Blankemeier, L., Cohen, J. P., Kumar, A., Van Veen, D., Gardezi, S. J. S., Paschali, M., … & Chaudhari, A. S. (2024). Merlin: A vision language foundation model for 3d computed tomography. Research Square, rs-3.
[6] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.
[7] Hein, D., Chen, Z., Ostmeier, S., Xu, J., Varma, M., Reis, E. P., … & Chaudhari, A. S. (2025, July). CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 27679-27702).
Kinematic Calibration and Reachability-Based Planning for Precision Robotic Needle Insertion in Liver Ablation
Analyzing contrast agent inhomogeneities in the left atrial appendage
