Index
Evaluation and optimization of an implicit neural representation framework for markerless tumor tracking during radiotherapy
In the radiotherapy of tumors, a precise definition of the tumor volume is essential, in order to keep the radiation exposure of the surrounding tissue as low as possible. For this purpose, a planning CT scan is taken before treatment, which is used to define the area to be irradiated, also known as the planning target volume (PTV). The PTV is always chosen to be larger than the actual volume of the tumor in order to ensure that a sufficiently high dose is applied despite uncertainties such as positioning or movement of the tumor volume due to respiration. [1] Especially in the thorax and abdomen, the intrafractional movement due to respiration and physiological changes is very high. In order to compensate for this, the respiratory movement can be measured using external surrogates and imaging techniques and its extent can also be restricted using special breathing techniques. However, these methods only allow an indirect conclusion to be drawn about the tumor position. Although it is possible to measure the movement of the tumor using implanted markers, this is an additional invasive procedure, which is associated with corresponding risks and delays the start of treatment. [2] For the most accurate description of tumor movement, it would be advantageous to automatically segment the tumor on the fluoroscopic x-ray images of the linear accelerator and track its position in real-time. Since the low soft tissue contrast in the fluoroscopy projection images impedes distinguishing the tumor from surrounding structures, tracking the tumor in a synthesized 3D scan volume and then projecting its location onto the 2D x-ray image could improve the segmentation quality. Shao et al. [3] have recently presented this kind of approach with the dynamic reconstruction and motion estimation (DREME) framework. During training they divide the 3D tracking into 2 separate tasks, first a motion estimation consisting of a CNN encoder and a B-spline- based interpolant and second the reconstruction of a reference CBCT scan from the pre-treatment dynamic CBCT projections using implicit neural representations (INR). During inference, the network gets the x-ray projections as input to estimate the motion and deform the reference CBCT volume to synthesize the current real-time CBCT. [3] The goal of this thesis is to re-implement the DREME framework and evaluate its performance on our own dataset of abdominal tumors, since in their paper, Shao et al. [3] report results only on a digital phantom and a lung dataset. Furthermore their reported training time of 4 hours is not feasible in the current clinical workflow, therefore this thesis aims to explore optimization techniques, e.g. pre-training on the planning CT, to reduce the training time.
The thesis will include the following points:
- Literature review on INR deep learning methods;
- Implementation of the DREME framework for real-time motion tracking;
- Performance evaluation on our fluoroscopy dataset;
- Exploration of different strategies to reduce the training time (e.g. pre-training on the planning CT or other patients, higher parallelization, selective loss computation, etc.).
If you are interested in the project, please send your request to: pluvio.stephan@uk-erlangen.de
Prior experience in Python, Deep Learning and PyTorch is required.
Evaluation of SHViT for volumetric Semantic Segmentation in Industrial CT Scans
Industrial computed tomography (iCT) is a widely applied tool in non-destructive testing, material analysis, quality control, and metrology. Semantic segmentation of industrial CT data plays a central role in these applications by enabling quality inspection, material differentiation and part separation [1]. While convolutional neural networks (CNNs) have traditionally performed well in segmentation tasks by capturing local structures, their limited ability to model long-range dependencies poses challenges in complex 3D datasets.
Transformer-based models have recently emerged as promising alternatives. By dividing the input into patches and using self-attention mechanisms, transformers can model global dependencies. However, early vision transformers had difficulties capturing spatial structure and learning from limited data. The Swin Transformer was one of the first models to address these issues by introducing a hierarchical structure and shifted windows, combined with an inductive bias that improves generalization on small datasets [2].
Despite these advances, transformers remain resource intensive. New models such as the Shifted-window Hierarchical Vision Transformer (SHViT) aim to reduce computational costs while maintaining performance. SHViT extends the Swin architecture and improves spatial modeling and efficiency through a refined hierarchical structure with shifted windows [3].
This thesis focuses on the implementation and evaluation of a volumetric SHViT model for 3D semantic segmentation. The model is tested on a real-world dataset of industrial CT scans of boxed shoes, which includes several segmentation tasks: separating the shoes from their surroundings and identifying individual components such as the insole, outsole, and upper [4]. Typically for industrial CT data, the dataset is limited in size. Yet, its structural variability makes it an interesting benchmark for assessing model generalization. As evaluation metric for the class imbalanced segmentation dataset primarily the F1-score is used. The network is also evaluated in terms of memory and computational resource use.
The SHViT model will be compared to a CNN-based baseline, evaluating accuracy, robustness, and computational efficiency in the context of 3D industrial segmentation. While the study aims to inform the selection of neural architectures for iCT applications, its conclusions are limited using a single dataset. Nonetheless, SHViT shows potential for broader use in iCT, as it could enable the efficient application of transformer-based models to volumetric segmentation across diverse industrial datasets.
Literature
[1] | S. a. G. P. a. V. P. a. D. W. Bellens, „Machine learning in industrial X-ray computed tomography–a review,“ CIRP Journal of Manufacturing Science and Technology, pp. 324–341, 2024. |
[2] | Z. a. L. Y. a. C. Y. a. H. H. a. W. Y. a. Z. Z. a. L. S. a. G. B. Liu, „Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,“ in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, CA, 2021. |
[3] | S. a. R. Y. Yun, „SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design,“ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5756–5767, 2024. |
[4] | M. Leipert, G. Herl, J. Stebani, S. Zabler und A. Maier, „Three Step Volumetric Segmentation for Automated Shoe Fitting,“ e-Journal of Nondestructive Testing, Bd. 28, Nr. 3, 2023. |
Exploring RNN-Transducers for Named Entity Recognition in Biomedical Literature
Shallow Networks and AI Explainability in Context of vDCE for Breast MRI
Introduction
Dynamic Contrast-Enhanced MRI (DCE-MRI) is a key tool in breast cancer diagnostics, offering detailed vascular information essential for identifying and evaluating tumors [1]. However, ontrast agents used in this process can pose risks, particularly for patients with kidney issues or allergies [2]. Virtual Dynamic Contrast Enhancement (vDCE) provides a promising alternative by enerating contrast-enhanced images computationally, removing the need for actual contrast agents [3]. This thesis explores improving vDCE through smaller, more interpretable neural and dynamic network architectures, focusing on better resource explainability.
Motivation
Smaller, shallow neural networks offer several advantages, such as:
• Lower Computational Needs: Shallow models require less processing power, making them ideal for limited-resource environments [4].
• Localized Analysis: These models can focus on specific regions, such as individual breast areas, which improves diagnostic accuracy [5].
• Enhanced Transparency: Simpler architectures provide greater clarity in their decision-making process, making results easier for clinicians to interpret [6].
• Since insights derived from one breast often do not affect the other, this localized and interpretable approach is particularly well-suited for breast MRI analysis.
Objectives
• Develop and Test Shallow Neural Network Models for vDCE: Design models that balance accuracy with simplicity [4].
• Implement Explainability Tools: LIME, and SHAP to make model decisions clearer to clinicians [6].
• Explore the Efficiency-Accuracy Trade-off: Examine how smaller models can maintain diagnostic accuracy while being computationally efficient.
• Explore patch-based approaches:
Methodology
• New Network Architectures: Investigate linear models, dynamic convolutions, hypernetworks, and attention mechanisms to optimize shallow networks.
• Explainability Methods: Apply, LIME, and SHAP for clearer decision insights. This includes exploring the impact of patch size on capturing spatial context and analyzing the significance of specific input features on model’s decision making.
• Performance Metrics: Compare shallow models against deeper models for accuracy, efficiency, and clarity. The evaluation will include a metrics-based comparison with state-of-the-art methods [3] and a reader study involving radiologists to assess the clinical relevance and usability of the outputs
References
1. Turnbull, L.W. (2009), Dynamic contrast-enhanced MRI in the diagnosis and management of breast cancer. NMR Biomed., 22: 28-39. https://doi.org/10.1002/nbm.1273
2. Andreucci M, Solomon R, Tasanarong A. Side effects of radiographic contrast media: pathogenesis, risk factors, and prevention. Biomed Res Int. 2014;2014:741018. doi: 10.1155/2014/741018. Epub 2014 May 11. PMID: 24895606; PMCID: PMC4034507.
3. Schreiter, Hannes, et al. “Virtual dynamic contrast enhanced breast MRI using 2D U-Net Architectures.” medRxiv (2024): 2024-08.
4. Prinzi, F., Currieri, T., Gaglio, S. et al. Shallow and deep learning classifiers in medical image analysis. Eur Radiol Exp 8, 26 (2024). https://doi.org/10.1186/s41747-024-00428-2
5. van der Velden, B.H.M., Janse, M.H.A., Ragusi, M.A.A. et al. Volumetric breast density estimation on MRI using explainable deep learning regression. Sci Rep 10, 18095 (2020). https://doi.org/10.1038/s41598-020-75167-6
6. Gulum, M.A.; Trombley, C.M.; Kantardzic, M. A Review of Explainable Deep Learning Cancer Detection Models in Medical Imaging. Appl. Sci. 2021, 11, 4573. https://doi.org/10.3390/app11104573
Incremental Learning for Classifying Document Types Based on Content and/or Layout
Investigation of contrastive multimodal methods for the analysis of EEG signals evoked by visual stimuli
Enhancing AI Agent Capabilities for Industrial Engineering
Towards Domain Adaptation of Foundational Models in Medical Imaging
Report Generation and Evaluation for 3D CT Scans Using Large Vision-Language Models
Automatically generating reports for various medical imaging modalities is a topic that has received a lot of attention since the advancements in large multimodal models (LMMs) [1]. While the quality of generated reports, specifically in terms of diagnostic accuracy, cannot yet match reports written by expert radiologists, it has been shown that even imperfect reports can be used by radiologists as a starting point to improve the efficiency of their workflow [2].
This Master Thesis focuses on generating reports for 3D chest CT scans, using the CT-RATE dataset [3], which contains over 25,000 scans with matching anonymized reports written by expert personnel. It will also utilize the work of RadGenome-Chest CT [4], that includes a sentence segmentation of reports based on the anatomical regions they are referencing.
The first part of the thesis focuses on finding and implementing suitable metrics for evaluating the quality of a generated report against reference reports. This continues to be a field of active research, as currently used metrics do not fully align with human preference. In this thesis, both traditional methods based on n-gram overlap such as BLEU [5] will be employed, while also looking into more recent metrics such as GREEN-score [6], which is based on a finetuned LLM.
The second part will focus on training a report generation model using the architecture of CT-CHAT [3]. CT-CHAT is a model that has been trained on variations of the CT-RATE dataset as a vision-language assistant that can reason about chest CT-scans. First, a baseline model will be trained solely on the task of recreating variations of the CT-RATE ground truth reports. Next, the model will be trained to break the report generation down into smaller tasks, such as analyzing one anatomical region at a time, inspired by Chain-of-Thought approaches [7], in an attempt to improve report quality.
[1] L. Guo, A. M. Tahir, D. Zhang, Z. J. Wang, und R. K. Ward, „Automatic Medical Report Generation: Methods and Applications“, SIP, Bd. 13, Nr. 1, 2024, doi: 10.1561/116.20240044.
[2] J. N. Acosta u. a., „The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports“, 16. Dezember 2024, arXiv: arXiv:2412.12042. doi: 10.48550/arXiv.2412.12042.
[3] I. E. Hamamci u. a., „Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography“, 16. Oktober 2024, arXiv: arXiv:2403.17834. doi: 10.48550/arXiv.2403.17834.
[4] X. Zhang u. a., „RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis“, 25. April 2024, arXiv: arXiv:2404.16754. doi: 10.48550/arXiv.2404.16754.
[5] K. Papineni, S. Roukos, T. Ward, und W.-J. Zhu, „BLEU: a method for automatic evaluation of machine translation“, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics – ACL ’02, Philadelphia, Pennsylvania: Association for Computational Linguistics, 2001, S. 311. doi: 10.3115/1073083.1073135.
[6] S. Ostmeier u. a., „GREEN: Generative Radiology Report Evaluation and Error Notation“, in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, S. 374–390. doi: 10.18653/v1/2024.findings-emnlp.21.
H2OArmor: A Dynamic Data-driven Leak Detection Framework for Varied Digital Maturity Levels in Water Utilities
In response to the pressing need for advanced leak detection in water distribution networks, this research endeavors to develop a sophisticated machine-learning pipeline named H2OArmor. The pipeline is designed to leverage various methods for detecting leakages by utilizing diverse data sources. Crucially, the ensembled opinions of these methods will be intelligently integrated to generate a confidence score for precise event detection.
H2OArmor’s development will be anchored on a robust framework. This framework not only streamlines the implementation of machine learning algorithms but also offers flexibility in onboarding different water utilities. The methodology of the thesis should include multiple machine learning models contributing towards a final informed decision on identifying leak events at DMA level. Furthermore, the thesis scope includes implementation of an end-to-end automated ML Pipeline, which can be used at scale to deploy with minimal manual intervention.
The thesis encompasses several key work packages:
- Framework Implementation: Utilization of a robust ML framework to build the Machine Learning pipeline, ensuring efficiency and compatibility. Either there would be a need to develop such a framework from scratch or there would be utilization of components of a pre-built framework.
- Development of ML-based Methods: Creation of machine learning methods ensuring accuracy and adaptability.
- Automated Onboarding Process: Designing an automated onboarding process for new methods, enhancing the scalability and versatility of H2OArmor as additional techniques are incorporated.
- Scoring Mechanism Development: Creation of a scoring mechanism that synthesizes the ensemble opinions of the various methods, providing a unified confidence score for leak detection events.
H2OArmor aims to revolutionize leak detection in water distribution networks by tailoring its approach to the digital maturity levels of water utilities, ensuring optimal performance and reliability across a spectrum of operational contexts.
[1]Fan, X., Zhang, X. & Yu, X.(.B. Machine learning model and strategy for fast and accurate detection of leaks in water supply network. J Infrastruct Preserv Resil 2, 10 (2021). https://doi.org/10.1186/s43065-021-00021-6