Index
Automatic Speech Recognition at Phoneme and Word-Level To Analyze Parkinson’s Disease
A Comparative Study of Transformer-Based Models and CNNs for Semantic Segmentation in Industrial CT Scans
Industrial computed tomography (iCT) is extensively utilized for non-destructive testing, material analysis, quality control, and metrology. In these applications, semantic segmentation is crucial, particularly for material analysis [1]. In recent years, Convolutional Neural Networks (CNNs) have been employed successfully for material segmentation, handling low-quality reconstructions, and performing complex segmentations where local context is vital. However, CNNs often struggle to capture long-range dependencies due to their localized nature.
Recently, transformer architectures have shown superior performance in various segmentation tasks. Unlike CNNs, which rely on filter banks to extract local features, transformers encode image patches into visual tokens and utilize self-attention mechanisms to process the entire input in parallel. This allows transformers to effectively capture long-distance relationships, though they may be less efficient when it comes to preserving fine local details [2]. In the context of three-dimensional (3D) data segmentation, both CNNs and transformers face challenges due to the increased memory requirements and the higher complexity of patterns that arise in 3D space.
One promising model for addressing these challenges is the Swin Transformer, which incorporates a hierarchical structure with shifted windows, enabling it to capture both local and global dependencies more efficiently [3]. To tackle the limitations of CNNs and transformers individually, hybrid models combining both architectures have been proposed. For instance, Cai et al. introduced a model that combines the Swin Transformer with CNNs for 3D segmentation tasks, taking advantage of each architecture’s strengths [2]. Both CNNs and Swin Transformers are known for their ability to generalize well on smaller datasets, thanks to their inductive bias [3] [4].
This thesis focuses on applying a hybrid approach combining the Swin Transformer and CNNs to a complex dataset of iCT scans of shoes, where the objective is to segment the shoes into their individual components, as demonstrated in previous work by Leipert et al. [5]. Despite the dataset’s small size, its high intrinsic variability and the relevance of both local and global dependencies make it an ideal candidate for evaluating segmentation methods. The segmentation will be performed in 3D, highlighting the challenges and opportunities of using these advanced models.
Through a comparative analysis of CNNs, the Swin Transformer, and their combined approaches, this thesis aims to provide insights into the strengths and limitations of each method in the context of 3D semantic segmentation on complex industrial CT datasets. The findings will contribute to improving segmentation techniques for iCT applications, potentially enhancing the accuracy and efficiency of material analysis in industrial contexts. The main limitation of the study is the application to a single dataset.
Literaturverzeichnis
[1] | S. Bellens, P. Guerrero, P. Vandewalle and W. Dewulf, “Machine learning in industrial X-ray computed tomography – a review,” CIRP Journal of Manufacturing Science and Technology, vol. 51, pp. 324-341, 2024. |
[2] | Y. Cai, Y. Long, Z. Han, M. Liu, Y. Zheng, W. Yang and L. Chen, “Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution,” BMC Medical Informatics and Decision Making, vol. 23, p. 33, 2023. |
[3] | Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021. |
[4] | Y. Z. J. Z. D. Z. R. Y. Y. X. Xingwei He, “Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-15, 2022. |
[5] | M. Leipert, G. Herl, J. Stebani, S. Zabler and A. K. Maier, “Three Step Volumetric Segmentation for Automated Shoe Fitting,” in 12th Conference on Industrial Computed Tomography (iCT) 2023, Fürth, Germany, 2023. |
EcoScapes: LLM-powered advice for crafting sustainable cities
EcoScapes: LLM-powered advice for crafting sustainable cities
Climate adaptation is vital for the sustainability and sometimes the mere survival of our urban
areas [1, chapters TS.C.8 and TS.D.1]. However, small cities often struggle with limited personnel
resources and integrating vast amounts of data from multiple sources for a comprehensive analysis
[1, chapter TS.D.1.3]. Moreover, the complexity of the topic can overwhelm administrative staff and
local politicians alike. To overcome these challenges, this thesis proposes a multi-layered system
combining specialized Large Language Models (LLMs), satellite imagery and a knowledge base to aid
in developing effective climate adaptation strategies.
Initially, the system uses provided location information to request relevant satellite imagery, which can
be used by all subsequent components.
The architecture’s modular core encompasses several LLMs and expert systems that examine the
satellite data to offer insights on different climate adaptation aspects. Examples of potential functions
might include, but are not limited to, the identification of heat islands, areas threatened by flooding, or
the assessment of vegetation cover.
In the last step, the system consolidates the findings from the preceding modules to generate a
comprehensive report on the existing situation and recommend potential adaptation strategies.
In order to assess the system’s performance, we will compare the generated outputs with those of
unaltered LLMs and a model inspired by ChatClimate [2].
References
[1] P¨ortner, H.-O., D.C. Roberts, H. Adams. et al. 2022: Technical Summary. [H.-O. P¨ortner, D.C. Roberts,
E.S. Poloczanska, K. Mintenbeck, M. Tignor, A. Alegr´ıa, M. Craig, S. Langsdorf, S. L¨oschke, V. M¨oller,
A. Okem (eds.)]. In: Climate Change 2022: Impacts, Adaptation, and Vulnerability. Contribution of
Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change
[H.-O. P¨ortner, D.C. Roberts, M. Tignor, E.S. Poloczanska, K. Mintenbeck, A. Alegr´ıa, M. Craig, S.
Langsdorf, S. L¨oschke, V. M¨oller, A. Okem, B. Rama (eds.)]. Cambridge University Press, Cambridge,
UK and New York, NY, USA, pp. 37-118, doi:10.1017/9781009325844.002.
[2] Vaghefi, S.A., Stammbach, D., Muccione, V. et al. ChatClimate: Grounding conversational AI in climate
science. Commun Earth Environ 4, 480 (2023). https://doi.org/10.1038/s43247-023-01084-x
Verification of deep learning classifications in test systems for industrial productions
Analyzing the influence of writer-dependent features in writer identification using Convolutional Neural Networks
Definition und Implementierung einer prototypischen Smart Home Schnittstelle für ein cloudbasiertes Energiemanagementsystem
Transformers vs. Convolutional Networks for 3D segmentation in industrial CT data
The current state of the art for segmentation in industrial CT are oftentimes CNNs.
Transformer based models are sparsely used.
Therefore, this project wants to compare the semantic segmentation performance of transformers (that include global context into segmentation), pure convolutional neural networks (that use local context) and combined methods (like this one: https://doi.org/10.1186/s12911-023-02129-z) on an industrial CT dataset of shoes like in this study: https://doi.org/10.58286/27736 .
Only available as Bachelors thesis / Research Project
Developing and Evaluating Image Similarity Metrics for Enhanced Classification Performance in 2D Datasets
Work description
This thesis focuses on the development and evaluation of novel image similarity metrics tailored for 2D datasets, aiming to improve the effectiveness of classification algorithms. By integrating active learning methods, the research seeks to refine these metrics dynamically through iterative feedback and validation. The work involves extensive testing and validation across diverse 2D image datasets, ensuring robustness and applicability in varied scenarios.
The following questions should be considered:
- What metrics can effectively quantify the variance in a training dataset?
- How does the variance within a training set impact the neural network’s ability to generalize to new, unseen data?
- What is the optimal balance of diversity and specificity in a training dataset to maximize NN performance?
- How can training datasets be curated to include a beneficial level of variance without compromising the quality of the neural network’s output?
- What methodologies can be implemented to systematically adjust the variance in training data and evaluate its impact on NN generalization?
Prerequisites
Applicants should have a solid background in machine learning and deep learning, with strong technical skills in Python and experience with PyTorch. Candidates should also possess the capability to work independently and have a keen interest in exploring the theoretical aspects of neural network training.
For your application, please send your transcript of record.