Index

Benchmarking State-of-the-Art Transformers for Handwritten Document Layout Analysis

Vision-Language Models in Radiology

Tasks:
  1. Extend existing dataset with synthetically generated data
  2. Train multimodal vision-language model
  3. Perform extensive evaluation of the model on public datasets:
    • Investigate and apply suitable evaluation metrics.
    • Research state-of-the-art methods for comparison.
  4. (Optional: Contribute to writing a research paper based on the results.)
Requirements:
  1. Experience with PyTorch.
  2. Experience with training deep learning models.
  3. Ability to attend in-person meetings.
Application (Applications not following these requirements will not be considered):
  1. Curriculum Vitae (CV).
  2. Short motivation letter (max. one page).
  3. Transcript of records.

Send your application with the subject “Application VLM Radiology + your full name” to Lukas.Buess@fau.de.

Starting Date:
15.09.2025 or later
References:
[1]  I. E. Hamamci u. a., „Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography“, 16. Oktober 2024, arXiv: arXiv:2403.17834. doi: 10.48550/arXiv.2403.17834.
[2] Pellegrini, C., Özsoy, E., Busam, B., Wiestler, B., Navab, N., & Keicher, M. (2025). Radialog: Large vision-language models for x-ray reporting and dialog-driven assistance. In Medical Imaging with Deep Learning.
[3] S. Ostmeier u. a., „GREEN: Generative Radiology Report Evaluation and Error Notation“, in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, S. 374–390. doi: 10.18653/v1/2024.findings-emnlp.21.

Enhancing Financial QA with Hybrid Retrieval and Semantic Tagging

A Resource-Efficient AC Power Flow Prediction Framework using Physics-Informed GNNs and RL-Based Model Compression

1. Motivation

Modern power grids require accurate, real-time AC power flow prediction to ensure secure and efficient operation. Graph Neural Networks (GNNs) are promising due to their ability to model the grid’s topological and nonlinear properties. However, standard GNNs are often too large for edge deployment, and naïve compression can lead to physically infeasible predictions. There is a pressing need for compression techniques that preserve physical accuracy.

2. Objective

This project aims to develop a two-phase framework:
1. Physics-Informed GNN: Predict voltage magnitudes and phase angles from power grid snapshots using AC power flow laws.
2. RL-Guided Compression: Learn to prune and quantize the model efficiently while preserving physical feasibility.

From Pixels to Structure: Analysis of Lightweight Vision-Language Models for Document OCR and Structured Output Generation

Investigation on Object Detection in Industrial Settings Centered on Extended Reality Platforms Through Generation and Utilization of Synthetic Data from CAD Models

Vision Language Models for Patient Retrieval in Radiation Therapy

LLM-PatientRetrival

Federated Learning for Medical Vision-Language Models

This thesis investigates how federated learning can be applied to train vision-language models in the medical domain while preserving patient privacy. The work focuses on enabling multi-institutional collaboration without sharing sensitive data, supporting the development of secure and scalable AI solutions for healthcare.

Comparative Study of traditional and Deep learning Binarization Methods for Historical Document Processing

The aim of this thesis is to present a fairer and more practical comparison between traditional and deep learning-based binarization methods. Models chosen from both selections will be applied to historical document datasets and evaluated against the ground truth, focusings on their practical application, measuring their impacts on OCR performance and binarization quality, and developing a transparent and usable framework for the evaluation and comparison of binarization methods. This will make it possible to compare each method’s results directly in terms of text recognition quality.

Experimental Setup and Resources

Traditional methods:
Otsu and Sauvola.

Deep Learning-Based Models:

  • SAE (Selectional Autoencoders): to evaluate how well a compact encoder-decoder model learns pixel-level binarization for historical documents DeepOtsu: A U-Net-based enhancement model followed by Otsu thresholding for final binarization.
  • ROBIN (U-Net Variant): A representative of U-Net-based segmentation models to assess their performance in direct binary mask prediction.

Datasets

  • DIBCO (2009–2022): Benchmark dataset comprising printed and handwritten degraded documents
  • HisDB: Historical manuscript datasets with realistic degradation patterns. Technologies and Tools

Evaluation Metrics:

  • F-measure,
  • PSNR,
  • DRD,
  • NRM,
  • and OCR accuracy for practical evaluation.

Milestones
1. Literature Review: Study existing binarization techniques mentioned in the models
sections, particularly focusing on their application to historical documents.
2. Dataset Preparation: Collect and preprocess publicly available datasets. Align ground
truth masks and normalize formats for consistent evaluation
3. Model Implementation and Integration: Implement or modify binarization models
using PyTorch; create a single pipeline that includes both deep learning and traditional
models.
4. Evaluation and Comparison: Compare results with common metrics (F-measure,
PSNR, DRD, NRM) and visually compare outputs to quantitative results. The binarized
images can then be fed into an OCR system to evaluate the performance of the OCR
5. Analysis and Interpretation Discuss the strengths and limitations of each technique for
different types of degradation.
6. Documentation and Reporting: Compile all documentation (results and analysis) into a
report.

Unsupervised Image Retrieval for Auction Catalogues

proposal_final