Index

Layer-wise Analysis of Belief Representations in Transformer Language Models

Deep Learning-Based Defect Detection and Classification in CdTe CT Detector Wafers

Photon-counting CT (PCCT) is a modern technology that achieves higher resolution and lower radiation dose using CdTe (Cadmium Telluride) detectors, which directly convert X-ray photons into electrical signals, unlike conventional CT systems with silicon-based indirect conversion.

At Siemens Healthineers, CdTe detectors are made from processed wafers and inspected using infrared (IR) transmission imaging. A technician uses a software tool to manually label defects such as cracks, tellurium inclusions, grain boundaries, and patterns like “Milky Way” or “Twins.” This inspection process is time-consuming, subjective, and not scalable for high-volume production.

Deep learning models such as ResNet and YOLO [1] have demonstrated strong wafer defect classification and localization performance, particularly on synthetic datasets like WM-811K [2]. However, these methods are primarily designed for structured layouts and silicon wafers, which differ significantly from the CdTe-based wafers used in photon-counting CT detectors. While studies such as Kirschenmann et al. [3] have applied deep learning to CdTe crystals using IR microscopy, their focus has been on crystal characterization rather than surface defect detection in a production setting, which is the focus of this thesis.

To the best of our knowledge, no existing method addresses the automatic classification of wafer surface defects using IR images from real CdTe wafer production, which is the focus of this thesis. The aim is to develop a deep learning model to automatically detect and classify surface defects in IR images of CdTe wafers for faster, more consistent, and scalable inspection. The dataset consists of high-resolution IR images with pixel-wise labeled masks, where defects are annotated using color codes. Each defect class includes at least 1,000 labeled images, covering types such as cracks, tellurium inclusions, and other surface anomalies.

The key steps involved in this work are:

Literature Review A review of deep learning approaches for wafer defect detection will be conducted, with a focus on models that combine classification and localization, such as the ResNet- and YOLO-based framework proposed by Shinde et al. [2]. Relevant work on IR imaging and neural network applications for CdTe materials will also be considered.
Model Design and Implementation A deep learning architecture will be developed to classify and potentially localize surface defects. Preprocessing steps will be designed based on the format and structure of Siemens’ internal dataset.
Evaluation The model will be tested using standard evaluation metrics and compared to known methods. The goal is to assess how well it performs and whether it can be useful in real production environments.

References

[1] Shinde, M. et al. (2023). Wafer Defect Localization and Classification Using Deep Learning Techniques.

[2] WM-811K Dataset. MIT Lincoln Laboratory: Wafer Map Defect Dataset. https://www.ll.mit.edu/r-d/datasets/wm-811k-wafer-map-defect-dataset

[3] Kirschenmann, D. et al. (2023). Employing infrared microscopy in combination with a pre-trained neural network to visualize and analyze the defect distribution in cadmium telluride crystals.

Evaluation of Forecasting Approaches for Time Series Data in the Utility Domain

Time series data is a sequence of data points over time that allows to understand the evolution of a system by analyzing the trends and influencing variables. It serves as the foundation for time series forecasting, where historical patterns, trends, and seasonal variations are analyzed to make informed predictions about future values. Time series forecasting plays a crucial role in the utility sector by enabling accurate demand prediction, optimizing energy distribution, and ensuring efficient resource management, helping providers balance supply and demand while minimizing operational costs and outages.

A significant challenge in time series forecasting lies in the behavioral heterogeneity of the data. Each time series exhibits unique characteristics, making a one-size-fits-all approach inadequate. With respect to the utility sector, this is particularly evident in data from residential, agricultural and industrial zones, where distinct consumption patterns necessitate different forecasting models. This research seeks to address this challenge by determining the best-suited forecasting model that accounts for the unique characteristics of the time series data.

This thesis aims to evaluate time series forecasting approaches for water and heat utility networks and identify the most suitable forecasting models for different time series data with unique demand and usage patterns. The goal is to classify time series data into distinct categories by their characteristics to find and apply the best forecasting model for each. To achieve this, we will leverage statistical methods, Machine Learning (ML), and Deep Learning (DL) techniques, which offer advanced capabilities for handling non-linearities, automatically extracting features, managing large datasets, and capturing complex dependencies. Forecasting approaches are chosen based on their nature of input and the nature in which the data is processed. Popular approaches include statistical methods, frequency-aware techniques [1], machine learning algorithms [2], recurrent neural networks [3], transformer-based architectures [4], and emerging foundation models [5]. The research will involve categorizing time series data, training and testing different ML and DL models, and evaluating their performance based on various metrics to determine the most suitable model for each category.

The anticipated outcome of this thesis is the creation of a robust framework for classifying time series data by their unique characteristics and identifying the most suitable forecasting models for each category. This research aims to:

Classify time series data based on distinct attributes.
Conduct a comprehensive evaluation of various ML and DL models tailored to each data category.
Improve the precision of demand forecasts for water and heat networks through customized prediction models.

By determining models tailored to the specific characteristics of water and heat consumption data, this research proposal aims to identify the most suitable approach for a given time series and evaluate how they perform relative to each other leading to significantly improving the accuracy of future demand predictions, facilitating better resource management and infrastructure planning.

References:

[1] P. C. Young, D. J. Pedregal, and W. Tych, “Dynamic harmonic regression,” J Forecast, vol. 18, no. 6, pp. 369–394, Nov. 1999, doi: 10.1002/(SICI)1099-131X(199911)18:6<369::AID-FOR748>3.0.CO;2-K.

[2] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Mar. 2016, doi: 10.1145/2939672.2939785.

[3] M. Beck et al., “xLSTM: Extended Long Short-Term Memory,” May 2024, Accessed: Jun. 16, 2025. [Online]. Available: https://arxiv.org/pdf/2405.04517

[4] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A Time Series is Worth 64 Words: Long-term Forecasting with Transformers,” 11th International Conference on Learning Representations, ICLR 2023, Nov. 2022, Accessed: Jun. 16, 2025. [Online]. Available: https://arxiv.org/pdf/2211.14730

[5] V. Ekambaram et al., “Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series,” Jan. 2024. Available: https://arxiv.org/pdf/2401.03955

This thesis is part of the “UtilityTwin” project.

Automatic Patient Registration via DL–Based Landmark Detection On MRI Localizer Images

Dynamic Query Routing: Adaptive RAG Systems Leveraging Hallucination Risk and Specialization Affordance

Large Language Models responses are often generic and may contain false information leading to hallucinations. It takes a lot of resources to train LLM with updated data. As such, Individual organizations can use a Retrieval Augmented Generation (RAG) system with LLM to avoid those drawbacks and make it their own. It uses a private knowledge base to pick up necessary information relevant to the user’s query and feed it to LLM to generate detailed and context rich responses. However, RAG system may give rise to conflict between LLM’s prior learned memory and in-context memory due to new resources.

RAG can be optimized based on how the external knowledge base is utilized. Focus on key extraction from query, structured knowledge base, iteration/ recursion in searching relevant information and re-ranking mechanisms based on knowledge density/diversity are some of the strategies that can be used. Among them, there is a strategy that analyzes the user’s query if it is a simple one that requires smaller context or can be answered without external knowledge base, or complex one that requires more context to get accurate answers. This leads to dynamically adapting the retrieval process according to the user’s need and adjusting the context window. It can also refine its performance based on user interaction and feedbacks by prioritizing resources that have previously provided accurate and relevant answers.

This thesis will focus on development of similar approach and work on comparing them with previous similar works and with conventional RAG methods and improve performance where possible. Recent updates of LLMs such as GPT series, BERT or T5 models, which provide strong foundational capabilities for text generation and can be used for fine-tuning key extraction tasks, may be used depending on available computational resources. The research will use a digital copy of one of the course books used in Masters in AI program, as the primary source for external knowledge base, and use query datasets/evaluation datasets (a mix of straightforward queries and conceptual questions) based on the book. Recent similar works include the following:

Jeong, S., Baek, J., & Cho, S. (2024). Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity. https://arxiv.org/pdf/2403.14403v2
Wang, X., Sen, P., & Li, R. (2024). Adaptive Retrieval-Augmented Generation for Conversational Systems. https://arxiv.org/pdf/2407.21712

Improving AFFGANwriting by Exploring Deep Learning Models for Style Encoders and Image Generation of Sentence-Level Handwriting

Handwriting generation is a fundamental task in computer vision and natural language processing, with applications in personalized content generation and so on. The AFFGANwriting model presents a generative framework for synthesizing word-level handwritten images by fusing multistyle features using a GAN-based approach with a VGG-style encoder. However, its scope is limited in two ways:
• It only generates individual word images
• It used a fixed VGG backbone which may not capture style semantics as effectively as more modern alternatives such as CNN and transformer models (e.g. EfficientNet, ResNet, DINO).

With an increasing demand for personalized handwriting synthesis across longer text spans, there’s a clear motivation to explore if advanced backbone models can improve the feature extraction of the style. In addition, there is need to extend the generative capacity from words to full sentences and to interact ideally in a user-friendly interactive system.

Research questions

• Can more recent feature extractors like CNN and transformers (EfficientNet, ResNet, DINO) outperform VGG in capturing style-relevant features for handwriting generation?
• What are the architectural or training modifications required to extend AFFGANwriting from word-level to sentence-level image synthesis?
• How can the model be integrated into an intuitive web application that allows users to select a writing style and input arbitrary text for sentence-level generation?

Goal

To enhance AFFGANwriting’s quality and flexibility in handwriting image generation by:
• Upgrading the style encoder
• Enabling sentence-level synthesis
• Deploying the system as a web app for user interaction