Index

Enhancing AI Agent Capabilities for Industrial Engineering

Towards Domain Adaptation of Foundational Models in Medical Imaging

Report Generation and Evaluation for 3D CT Scans Using Large Vision-Language Models

Automatically generating reports for various medical imaging modalities is a topic that has received a lot of attention since the advancements in large multimodal models (LMMs) [1]. While the quality of generated reports, specifically in terms of diagnostic accuracy, cannot yet match reports written by expert radiologists, it has been shown that even imperfect reports can be used by radiologists as a starting point to improve the efficiency of their workflow [2].

This Master Thesis focuses on generating reports for 3D chest CT scans, using the CT-RATE dataset [3], which contains over 25,000 scans with matching anonymized reports written by expert personnel. It will also utilize the work of RadGenome-Chest CT [4], that includes a sentence segmentation of reports based on the anatomical regions they are referencing.

The first part of the thesis focuses on finding and implementing suitable metrics for evaluating the quality of a generated report against reference reports. This continues to be a field of active research, as currently used metrics do not fully align with human preference. In this thesis, both traditional methods based on n-gram overlap such as BLEU [5] will be employed, while also looking into more recent metrics such as GREEN-score [6], which is based on a finetuned LLM.

The second part will focus on training a report generation model using the architecture of CT-CHAT [3]. CT-CHAT is a model that has been trained on variations of the CT-RATE dataset as a vision-language assistant that can reason about chest CT-scans. First, a baseline model will be trained solely on the task of recreating variations of the CT-RATE ground truth reports. Next, the model will be trained to break the report generation down into smaller tasks, such as analyzing one anatomical region at a time, inspired by Chain-of-Thought approaches [7], in an attempt to improve report quality.

 

[1] L. Guo, A. M. Tahir, D. Zhang, Z. J. Wang, und R. K. Ward, „Automatic Medical Report Generation: Methods and Applications“, SIP, Bd. 13, Nr. 1, 2024, doi: 10.1561/116.20240044.

[2] J. N. Acosta u. a., „The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports“, 16. Dezember 2024, arXiv: arXiv:2412.12042. doi: 10.48550/arXiv.2412.12042.

[3] I. E. Hamamci u. a., „Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography“, 16. Oktober 2024, arXiv: arXiv:2403.17834. doi: 10.48550/arXiv.2403.17834.

[4] X. Zhang u. a., „RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis“, 25. April 2024, arXiv: arXiv:2404.16754. doi: 10.48550/arXiv.2404.16754.

[5] K. Papineni, S. Roukos, T. Ward, und W.-J. Zhu, „BLEU: a method for automatic evaluation of machine translation“, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics  – ACL ’02, Philadelphia, Pennsylvania: Association for Computational Linguistics, 2001, S. 311. doi: 10.3115/1073083.1073135.

[6] S. Ostmeier u. a., „GREEN: Generative Radiology Report Evaluation and Error Notation“, in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, S. 374–390. doi: 10.18653/v1/2024.findings-emnlp.21.

[7] Y. Jiang u. a., „CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation“, 28. Februar 2025, arXiv: arXiv:2406.11451. doi: 10.48550/arXiv.2406.11451.

H2OArmor: A Dynamic Data-driven Leak Detection Framework for Varied Digital Maturity Levels in Water Utilities

In response to the pressing need for advanced leak detection in water distribution networks, this research endeavors to develop a sophisticated machine-learning pipeline named H2OArmor. The pipeline is designed to leverage various methods for detecting leakages by utilizing diverse data sources. Crucially, the ensembled opinions of these methods will be intelligently integrated to generate a confidence score for precise event detection.

H2OArmor’s development will be anchored on a robust framework. This framework not only streamlines the implementation of machine learning algorithms but also offers flexibility in onboarding different water utilities. The methodology of the thesis should include multiple machine learning models contributing towards a final informed decision on identifying leak events at DMA level. Furthermore, the thesis scope includes implementation of an end-to-end automated ML Pipeline, which can be used at scale to deploy with minimal manual intervention.

The thesis encompasses several key work packages:

  1. Framework Implementation: Utilization of a robust ML framework to build the Machine Learning pipeline, ensuring efficiency and compatibility. Either there would be a need to develop such a framework from scratch or there would be utilization of components of a pre-built framework.
  2. Development of ML-based Methods: Creation of machine learning methods ensuring accuracy and adaptability.
  3. Automated Onboarding Process: Designing an automated onboarding process for new methods, enhancing the scalability and versatility of H2OArmor as additional techniques are incorporated.
  4. Scoring Mechanism Development: Creation of a scoring mechanism that synthesizes the ensemble opinions of the various methods, providing a unified confidence score for leak detection events.

H2OArmor aims to revolutionize leak detection in water distribution networks by tailoring its approach to the digital maturity levels of water utilities, ensuring optimal performance and reliability across a spectrum of operational contexts.

[1]Fan, X., Zhang, X. & Yu, X.(.B. Machine learning model and strategy for fast and accurate detection of leaks in water supply network. J Infrastruct Preserv Resil 2, 10 (2021). https://doi.org/10.1186/s43065-021-00021-6

Multimodal fusion of pose and visual information for gesture recognition in historical artworks

Multimodal fusion of pose and visual information for gesture recognition in historical artworks

 

Gestures in historical artwork can communicate the underlying human experiences, offering a broad outlook on the past sensory worlds. To explore this domain, we use the SensoryArt [1] – a dataset of multisensory gestures in historical artworks that comes with person pose estimation key points and gesture labels. The goal of the thesis is to perform gesture classification of the persons’ actions depicted on the paintings. We aim to investigate how additional information on the body posture, such as annotated skeleton information, can affect the performance of the models.

 

Mandatory Goals:

  • Train a model for a multi-label gesture classification on the cropped images with fused ground truth heatmaps of the SensoryArt dataset + evaluate on validation split.
  • Selection and training of a well-performing keypoint estimation model.
  • Evaluate the performance of the end-to-end pipeline on the cropped images consisting of predicting the heatmaps first and then classifying.
  • Train another model for a multiperson gesture classification problem on the image level with fused ground truth heatmaps of the uncropped images + evaluate on validation split.
  •  Perform an inference test of the model on original images with machine-generated heatmaps.

 

Optional Goals:

  • Test incorporating additional information on body position, not as heatmaps but as skeleton key point coordinates/angles.
  •  Conduct additional ablations such as cropping the humans out of the images in square size.
  • Integrate a multi-label approach into the detection pipeline.
  • Test human pose estimation on artwork using the additionally provided gesture labels.

[1] Zinnen, M., Christlein, V., Maier, A., & Hussian, A. (2024). SensoryArt (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10889613

A Comparative Study of Deep Learning Models for Brain Metastases Autosegmentation

CT Field-of-View Extension Dataset Simulation

Create a simulated dataset for CT FOV extension task using PYRONN

Improving manual annotation of 3D medical segmentation dataset using SAM2

In many medical scenarios, physicians need to annotate pixelwise objects in CT images, whole slide images (WSI), or cellular images. This annotation process often requires a significant amount of time and effort, especially when dealing with large datasets. To address this challenge, a web-based tool capable of automatically segmenting 3D and 2D medical images are widely expected.
EXACT is an existing web-based annotation platform and has already certain user base. Exact supports interdisciplinary collaboration and allows for both online and offline annotation and analysis of images across various domains. Physicians can annotate images directly through the platform’s web interface, which is intuitive and efficient. [1]
To enhance the functionality of Exact, an automatic segmentation plugin is explored and implemented in this thesis and integrate it with Exact. This plugin will enable physicians and researchers to automatically generate high-quality segmentation masks while annotating and save these masks for future use. This approach can significantly improve the efficiency of medical image annotation, reduce manual effort, and optimize medical imaging workflows.
A critical aspect of this project is selecting a segmentation model that is both efficient and accurate. I plan to adopt Segment Anything Model 2 (SAM2), as it has demonstrated robust performance in handling diverse medical imaging tasks (including CT, WSI, and cellular images) while ensuring segmentation precision and reliability. [2]

[1] Christian Marzahl, Marc Aubreville, Christof A. Bertram, Jennifer Maier, Christian Bergler, Christine Kröger, Jörn Voigt, Katharina Breininger, Robert Klopfleisch, and Andreas Maier. Exact: a collaboration toolset for algorithm-aided annotation of images with annotation version control. Scientific Reports, 11(1):4343, Feb 2021.

[2] Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feichtenhofer. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024.

Searching for evidence of world models in reinforcement learning agents

Advanced Machine Learning-Based High Demand Forecasting of Household Energy Consumption for Enhancing Grid Operations

This research examines methods for predicting household energy usage to assist in the management of peak demand and the maintenance of grid stability. The focus is on forecasting when energy consumption surpasses certain critical levels and for what duration, allowing for proactive energy management. The study looks at the impact of various data aggregation techniques on prediction accuracy and explores approaches to refine altered consumption patterns for better forecasting. By evaluating different forecasting models and their effectiveness, the work aims to enhance energy management, promote automation in grid operations, and strengthen data-driven decision-making for a more resilient and efficient power distribution system.