Index

Radiology Report Generation and Evaluation

This master thesis explores the training and evaluation of a vision-language model for radiology report generation using large-scale medical datasets. We aim to investigate how different clinical settings influence the quality of generated reports, with a focus on enhancing the evaluation of the generated reports.

Tasks:
  • Dataset preparation
  • Finetune vision-language models
  • Comprehensive evaluation
Requirements:
  • Experience with PyTorch and training models
  • Experience with vision or language models
  • (Optional) Experience using SLURM
  • (Recommended) Deep Learing / Pattern Recognition Lecture

 

Application: (Applications that do not follow the application requirements will not be considered)

Please send your CV, transcript of records, and short motivation letter (1 page max) with the subject “Application CXR-Report + your_full_name” to Lukas.Buess@fau.de

Start Date: 01.06.2025 or later

 

Relevant Literature:

[1] Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. Advances in neural information processing systems, 36, 34892-34916.

[2] Pellegrini, C., Özsoy, E., Busam, B., Navab, N., & Keicher, M. (2023). Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681.

[3] Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., … & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.

[4] Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., … & Delbrouck, J. B. (2024). Green: Generative radiology report evaluation and error notation. arXiv preprint arXiv:2405.03595.

Exploring Species-level Similarity in Bayesian Stimulus Priors of Artificial Intelligent Agents

Deep Learning-based Classification of Body Regions in Intraoperative X-Ray Images

Diffusion Transformer for CT artifacts compensation

Computed Tomography (CT) is one of the most important modality in modern medical imaging, providing invaluable cross-sectional anatomical information crucial for diagnosis, treatment planning, and disease monitoring. Despite its widespread utility, the quality of CT images can be significantly degraded by various artifacts arising from physical limitations, patient-related factors, or system imperfections. These artifacts, manifesting as streaks, blurs, or distortions, can obscure critical diagnostic details, potentially leading to misinterpretations and compromising patient care. While traditional iterative reconstruction and early deep learning methods have offered partial solutions, they often struggle with complex artifact patterns or may introduce new inconsistencies. Recently, diffusion models have emerged as a powerful generative paradigm, demonstrating remarkable success in image synthesis and restoration tasks by progressively denoising an image from a pure noise distribution. Concurrently, Transformer architectures, with their inherent ability to capture long-range dependencies via self-attention mechanisms, have shown promise in various vision tasks. This thesis investigates the potential of  Diffusion Transformer, for comprehensive CT artifact compensation. By synergizing the iterative refinement capabilities of diffusion models with the global contextual understanding of Transformers, this work aims to develop a robust framework capable of effectively mitigating a wide range of CT artifacts, thereby enhancing image quality and improving diagnostic reliability. This research explores the design, implementation, and rigorous evaluation of such a model, comparing its performance against existing state-of-the-art techniques.

From Prompt to Command: Adaptation of LLMs for Robotic Task Execution in Manufacturing

Fast heart sound detection using audio fingerprint

Style-based Handwriting Generation with LCM Diffusion Transformer

Handwriting synthesis has seen remarkable progress with the introduction of GANs and, more recently,
diffusion models. However, despite these advances, line-level handwriting generation remains a
challenging task due to the need for preserving both local character features and global stylistic
coherence. Models like One-DM [1] and Emuru [2] have set strong baselines, yet they either require
extensive training data or struggle with balancing content and style fidelity.

Recent studies have shown remarkable results in images generation using Diffusion Transformers [3].
A problem with diffusion models is the slow inference procedure. The Latent Consistency Models
(LCM) [4] have countered the problem, showing great results in generating images within a few steps.
This work proposes a novel approach that combines the structure-aware reasoning of transformers
with the denoising capabilities of diffusion models in combination with the LCM Method, designed
specifically for the line-level generation setting.

• Training a Variational Autoencoder [5] for line based handwriting generation
• Training a Diffusion Transformer combined with the Latent Consistency Method using the trained Autoencoder [3, 4].
• Evaluating results on IAM and CVL Dataset
• Building a model for shifting from line based to paragraph based generation
The implementation should be done in Python

References
[1] Gang Dai, Yifan Zhang, Quhui Ke, Qiangya Guo, and Shuangping Huang. One-shot diffusion mimicker for handwritten text generation. ArXiv, abs/2409.04004, 2024.
[2] Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli, Alessio Tonioni, and Rita Cucchiara. Zero-shot styled text image generation, but make it autoregressive. ArXiv, abs/2503.17074, 2025.
[3] William S. Peebles and Saining Xie. Scalable diffusion models with transformers. 2023 IEEE/CVF
International Conference on Computer Vision (ICCV), pages 4172–4182, 2022.
[4] Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference. ArXiv, abs/2310.04378, 2023.
[5] Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Found. Trends Mach. Learn., 12:307–392, 2019.

Depth-Aware Detector Localization in Freehand X-Ray Imaging

Surrogate Model for Physics Informed Lifetime Prediction in Power Electronics based on Mission Profiles

Evaluating the Effectiveness of Large Language Models in Named Entity Recognition in Specialized Industrial Domain