Index
Robust Tampered Text Detection in Document Images Using Multimodal Deep Learning
The goal of this thesis is to develop a high-accuracy deep learning model for detecting tampered text in document images. This includes manipulations such as word replacement, copy-paste edits, and layout-based alterations. The focus is on building a multimodal architecture that combines visual layout features
and semantic textual content to improve detection accuracy and robustness across diverse document types and manipulation styles.
Synthetic Data Generation and Deep Learning-Based Object Detection and Segmentation for Interventional Devices in Cardiac and Neurovascular Fluoroscopy
Seminar Herculaneum Papyri
1. The Context
In 79 AD, the eruption of Mount Vesuvius buried the Herculaneum library, carbonizing hundreds of papyrus scrolls. For centuries, they were unreadable; opening them would turn them to dust. In 2023, the Vesuvius Challenge changed history. By combining high-resolution CT scans with advanced Computer Vision and Machine Learning, a global community of researchers successfully virtually unwrapped and read parts of these scrolls for the first time.
2. The Problem
While the challenge produced winning results, it also produced “Competition Code.” Competition code is written to win, not to be read. It is often highly optimized and experimental but lacks documentation, theoretical explanations, and clean structure. Crucially, there are no academic papers accompanying these repositories. We have the solution, but we are missing the explanation of the methodology and the mathematical foundations.
3. Your Task
Your goal is to bridge the gap between raw code and a reproducible scientific baseline. You will select a specific component of the challenge (e.g., Segmentation, Ink Detection, Flattening), dissect the code, and transform it into a well-understood, documented research tool. You are not just running scripts; you are performing digital archaeology on the software itself.
4. Organization & Logistics
This is a specialized Computer Vision seminar designed for students who want to deep-dive into applied machine learning and software reverse-engineering.
- Format & Timeline: This is a Block Seminar taking place between February and May.
- ECTS Credits: We offer both 5 ECTS and 10 ECTS versions of this seminar depending on the depth of the project and your study requirements.
- Target Audience: We welcome students from the following study courses:
- Computer Science
- Artificial Intelligence
- Medical Engineering
5. How to Apply
To apply for this seminar, please send an email to linda-sophie.schneider@fau.de and thomas.gorges@fau.de with the following:
- Your Transcript of Records.
- A short motivation statement (3–4 sentences) explaining why you want to work on this specific topic.
- ECTS Preference: Please explicitly state whether you require the 5 ECTS or 10 ECTS version for your studies.
Important: We value brevity, so please keep your email short. Be aware that very long mails might be ignored.
AI-Driven Structured Reporting for Breast MRI Radiological Reports: Leveraging LLMs for Automated Label Extraction
Analyzing Methods for Efficient Language Model Adaptation with Domain-Specific Selective Layer Expansion
MetaMorph: A Unified Framework with Modular Designs for Joint Affine and Deformable Medical Image Registration
Multimodal Extraction of Lot-Level Metadata from Auction Catalogues using OCR and Vision Language Models
MasterThesis_AlishaMundTopology-Aware Edge-Map Enhancement of Scanning Electron Microscope Images
Unsupervised Learning for Detection of Rare Driving Scenarios
Curriculum Learning for Medical Vision-Language Models
This master thesis investigates how curriculum learning strategies can improve vision-language alignment of medical vision-language models. Instead of training on all samples uniformly, the thesis explores curricula that organize training data from easy to hard, coarse to fine, or generic to clinically complex cases. The goal is to design and evaluate different curriculum strategies for medical tasks such as radiology report generation or medical visual question answering.
Tasks:
- Dataset preparation
- VLM finetuning
- Comprehensive evaluation
Requirements:
- Experience with PyTorch and training models
- Experience with vision or language models
- (Optional) Experience using SLURM
- (Recommended) Deep Learing / Pattern Recognition Lecture
Application: (Applications that do not follow the application requirements will not be considered)
Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application CurriculumVLM + your_full_name” to Lukas.Buess@fau.de
Start Date: 15.01.2026 or later
Relevant Literature:
[1] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
[2] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.
[3] Hamamci, I. E., Er, S., Wang, C., Almas, F., Simsek, A. G., Esirgun, S. N., … & Menze, B. (2024). Developing generalist foundation models from a multimodal dataset for 3d computed tomography. arXiv preprint arXiv:2403.17834.
[4] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.
[5] Xu, J., Zhang, X., Abderezaei, J., Bauml, J., Boodoo, R., Haghighi, F., … & Delbrouck, J. B. (2025, November). RadEval: A framework for radiology text evaluation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 546-557).
[6] Liu, F., Ge, S., & Wu, X. (2021, August). Competence-based multimodal curriculum learning for medical report generation. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers) (pp. 3001-3012).
[7] Holland, R., Taylor, T. R., Holmes, C., Riedl, S., Mai, J., Patsiamanidi, M., … & PINNACLE consortium Prevost Toby 3 On behalf of the PINNACLE consortium Fritsche Lars 12 On behalf of the PINNACLE consortium Pfau Kristina 7 On behalf of the PINNACLE consortium Pfau Maximilian 8 13 On behalf of the PINNACLE consortium. (2025). Specialized curricula for training vision language models in retinal image analysis. NPJ Digital Medicine, 8(1), 532.