Index
A Hybrid TransUNet-TransFuse Architectural Framework for Ice Boundaries Extraction in Radio-Echo Sounding Data
Evaluation of the TransSounder [1] architecture for direct ice boundaries extraction from radio-echo sounding data.
References
[1] Ghosh, R., & Bovolo, F. (2022). Transsounder: A hybrid transunet-transfuse architectural framework for semantic segmentation of radar sounder data. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-13.
A Comparative Study of Transformer-Based Models and CNNs for Semantic Segmentation in Industrial CT Scans
Industrial computed tomography (iCT) is extensively utilized for non-destructive testing, material analysis, quality control, and metrology. In these applications, semantic segmentation is crucial, particularly for material analysis [1]. In recent years, Convolutional Neural Networks (CNNs) have been employed successfully for material segmentation, handling low-quality reconstructions, and performing complex segmentations where local context is vital. However, CNNs often struggle to capture long-range dependencies due to their localized nature.
Recently, transformer architectures have shown superior performance in various segmentation tasks. Unlike CNNs, which rely on filter banks to extract local features, transformers encode image patches into visual tokens and utilize self-attention mechanisms to process the entire input in parallel. This allows transformers to effectively capture long-distance relationships, though they may be less efficient when it comes to preserving fine local details [2]. In the context of three-dimensional (3D) data segmentation, both CNNs and transformers face challenges due to the increased memory requirements and the higher complexity of patterns that arise in 3D space.
One promising model for addressing these challenges is the Swin Transformer, which incorporates a hierarchical structure with shifted windows, enabling it to capture both local and global dependencies more efficiently [3]. To tackle the limitations of CNNs and transformers individually, hybrid models combining both architectures have been proposed. For instance, Cai et al. introduced a model that combines the Swin Transformer with CNNs for 3D segmentation tasks, taking advantage of each architecture’s strengths [2]. Both CNNs and Swin Transformers are known for their ability to generalize well on smaller datasets, thanks to their inductive bias [3] [4].
This thesis focuses on applying a hybrid approach combining the Swin Transformer and CNNs to a complex dataset of iCT scans of shoes, where the objective is to segment the shoes into their individual components, as demonstrated in previous work by Leipert et al. [5]. Despite the dataset’s small size, its high intrinsic variability and the relevance of both local and global dependencies make it an ideal candidate for evaluating segmentation methods. The segmentation will be performed in 3D, highlighting the challenges and opportunities of using these advanced models.
Through a comparative analysis of CNNs, the Swin Transformer, and their combined approaches, this thesis aims to provide insights into the strengths and limitations of each method in the context of 3D semantic segmentation on complex industrial CT datasets. The findings will contribute to improving segmentation techniques for iCT applications, potentially enhancing the accuracy and efficiency of material analysis in industrial contexts. The main limitation of the study is the application to a single dataset.
Literaturverzeichnis
[1] | S. Bellens, P. Guerrero, P. Vandewalle and W. Dewulf, “Machine learning in industrial X-ray computed tomography – a review,” CIRP Journal of Manufacturing Science and Technology, vol. 51, pp. 324-341, 2024. |
[2] | Y. Cai, Y. Long, Z. Han, M. Liu, Y. Zheng, W. Yang and L. Chen, “Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution,” BMC Medical Informatics and Decision Making, vol. 23, p. 33, 2023. |
[3] | Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021. |
[4] | Y. Z. J. Z. D. Z. R. Y. Y. X. Xingwei He, “Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-15, 2022. |
[5] | M. Leipert, G. Herl, J. Stebani, S. Zabler and A. K. Maier, “Three Step Volumetric Segmentation for Automated Shoe Fitting,” in 12th Conference on Industrial Computed Tomography (iCT) 2023, Fürth, Germany, 2023. |
Generation of IEC 61131-3 SFCs conditioned on textual user intents and existing sequences
3D CT Image Visualization using Blender
Introduction:
This project aims to develop a streamlined pipeline for 3D CT images visualization using Blender and Bioxel Nodes. You’ll create a step-by-step process to import, process, and render medical imaging data, resulting in high-quality scientific visualizations. This 5 ECTS project will enhance your technical skills and ability to visualize complex medical data.
Source: https://omoolab.github.io/BioxelNodes/0.1.x/
Prospective candidates are warmly invited to send their CV and transcript to yipeng.sun@fau.de.
CBCT to CT Translation Using Deep Learning
Neural Network Implementation of Reaction-Diffusion Equations for Tumor Growth Modeling Using Stochastic Differential Equations
Retrieval Augmented Generation for Medical Question Answering
Project Seminar: Reproduce Research Results
In this seminar, students will engage in reproducing state-of-the-art scientific results with two main objectives. Firstly, students will work on projects that are close to current state-of-the-art research, and secondly, they will develop essential competencies in reproducing and critically analyzing scientific results. The projects will be tailored to match each student’s interests in terms of methodology and application, while the task requirements and grading criteria will be standardized across the board. The outcome of this project will contribute to the scientific community by providing a report on the state of reproducibility within the field.
The seminar will begin with a series of lectures. Students will initially evaluate publications from leading conferences in the field, focusing on their reproducibility, to gather comprehensive insights and understand the challenges involved. Typically, the evaluation will concentrate on publications from top-tier international conferences, such as CVPR and MICCAI. The specific conferences of focus may change each semester and will be announced at the start of the semester.
Students will have the option to choose from varying degrees of reproduction effort, ranging from attempting to reproduce a single result from a paper to fully implementing an entire paper. Depending on the complexity of the chosen task, students may analyze one or multiple publications.
Peer feedback and exchanges within small groups will form part of the seminar, although all reproduction efforts and deliverables will be individual work.
If you are interested, please join the first lecture on 16.10.2024 at 8.15 am in lecture hall H4 (Martensstraße 1, 91058 Erlangen).
Course registration opens on October 16, 2024, and will close on October 20, 2024. The StudOn link and password will be shared during the first lecture. Registration will follow a first-come, first-served basis.
Real-World Constrained Parameter Space Analysis for Rigid Head Motion Simulation
Description
In recent years, the application of deep learning techniques to medical image analysis tasks and image quality enhancement has proven to be a useful tool. One critical area where deep learning models have shown promising results is for patient motion estimation in CT scans [1],[2].
Deep learning models highly depend on the quality and diversity of the underlying training data, but well-annotated datasets, where the patient motion throughout the whole scan is known, are sparse. This is typically overcome with the generation of synthetic data, where motion-free clinical acquisitions are corrupted with simulated patient motion by altering the relevant components in the projection matrices. In the case of head CT scans, the rigid patient motion can be parameterized by a 6DOF trajectory over all acquisition frames. This is typically done by applying a Gaussian motion or, for more complex patterns, using B-splines. However, these simulated patterns often fall short of mimicking real head motion observed in clinical settings, especially by lacking complex spatiotemporal correlations. To provide more realistic training samples it is necessary to define a real-world constrained parameter space, respecting correlations, time dependencies and anatomical boundaries. This allows for neural networks to generalize better to real-world data.
This thesis aims to perform a conclusive analysis of the parameter space of rigid (6DOF) head motion patterns, obtained from measurements with an in-house optical tracking system integrated in a C-arm CT scanner at Siemens Healthineers in Forchheim. By analyzing the spatiotemporal correlations and constraints in the 6DOF parameter space, lower-dimensional underlying structures might be uncovered. Clustering techniques can be incorporated to further reveal sub-manifolds in the 6DOF space, as well as distinguishing different classes of motion types like breathing, nodding, etc. A Variational Autoencoder (or similar) should be trained with the goal of providing annotated synthetic datasets with realistic motion patterns.
[1] A. Preuhs et al., “Appearance Learning for Image-Based Motion Estimation in Tomography,” in IEEE Transactions on Medical Imaging, vol. 39, no. 11, pp. 3667-3678, Nov. 2020
[2] Chen Z, Li Q, Wu D., “Estimate and compensate head motion in non-contrast head CT scans using partial angle reconstruction and deep learning,” in Medical Physics 2024; 51: 3309–3321