Index
Evaluating Urban Change Detection and Captioning in Remote Sensing
Artificial Data Generation and OCR Processing for Improved Analysis of Jewish Gravestone Inscriptions
Thesis Description
Digital preservation and automated analysis of historical inscriptions are essential for understanding cultural heritage. Jewish gravestone inscriptions provide rich insight into historical, religious, and genealogical records. However, these inscriptions often suffer significant degradation due to age, environmental exposure, and the material composition of the gravestones. The challenge is compounded by inscriptions written in multiple languages, like German and Hebrew, with unique characters and complex typographical structures [1]. Traditional OCR (Optical Character Recognition) systems frequently struggle to accurately transcribe degraded text or non-standard layouts, especially when faced with limited training data. The scarcity of labeled high-quality training data is a major bottleneck, making it difficult for OCR models to generalize well to new or unseen inscriptions. Synthetic data generation plays a crucial role in overcoming these limitations by creating realistic but artificial training examples that include known ground truth, and significantly expanding datasets, which further has the potential to improve OCR performance [2, 3].
The generation of synthetic data can simulate various conditions, such as weathering, unique inscription layouts, and multi-lingual complexities. This capability enables machine learning models to train on data that would otherwise be costly or impossible to obtain. By leveraging advancements in contemporary deep learning, particularly GANs, VAEs, and large language models, it is now possible to create high-fidelity, annotated images that closely mimic real-world data [1, 4, 5]. Such synthetic datasets not only address issues of data scarcity but also introduce controlled variability that can help OCR models better handle complex visual and linguistic scenarios [3, 5].
This research aims to utilize these synthetic data generation methods to advance the transcription of complex Jewish gravestone inscriptions. By combining inpainting techniques and machine-generated text overlays, as discussed in Methodology below, the thesis will provide a framework that enhances current OCR capabilities and ensures more comprehensive and reliable digitization of historical inscriptions [1, 2].
Research Objectives
The primary objective of this study is to develop a pipeline that leverages synthetic data generation and advanced OCR techniques to transcribe inscriptions on Jewish gravestones more effectively. The key research goals are:
- Synthetic Data Generation: Create synthetic gravestone images with realistic inscriptions and reliable ground truth for training and evaluation purposes. This includes using inpainting methods and employing generative models to simulate aging effects, unique typography, and multilingual inscriptions [3, 5].
- OCR Enhancement: Synthetic data allows the creation of large, diverse datasets. These data of varied text styles, fonts, and layouts increase the training corpus, improving the OCR model’s generalization capabilities. Hence, it could help enable better training, testing, and also OCR performance.
- Evaluation of Data Synthesis Impact: Analyze how different data synthesis techniques affect OCR performance by calculating metrics such as Levenshtein Distance, Character Error Rate (CER), Word Error Rate (WER), and further focusing on improvements in recognizing and transcribing of worn, complex inscriptions [1, 5].
Transcribing Jewish gravestone inscriptions poses several unique challenges:
- Material and Inscription Variability: The gravestones exhibit different inscription methods (e.g., carved, etched, metalwork), contributing to variability in text legibility [4].
- Environmental and Lighting Conditions: Shadows, reflections, and varying camera angles further hinder the transcription process by altering the appearance of the text [2, 4].
- Multi-lingual Nature: German and Hebrew scripts have distinct characteristics that require specialized language models. Inscriptions often include named entities and unique historical terms that challenge standard OCR models [3, 5].
- Limited Dataset Availability: The number of high-quality, annotated images of gravestones is minimal, necessitating the creation of synthetic data to bolster training datasets [4].
Methodology
- Data Synthesis:
- Manual Segmentation for Ground Truth Creation: Segment gravestone areas in images manually to generate reliable ground truth masks, addressing limitations of automated segmentation methods as it has comparatively higher accuracy for complex and degraded inscriptions and also provides reliable ground truth creation.
- Advanced Inpainting Techniques: Implement GANs and diffusion models to remove existing text from gravestones, creating a clean base image for text overlay.
- Applying different synthetic data generation methods:
- Overlaying Synthetic Text: Generate synthetic inscriptions using LLMs tailored for German and Hebrew scripts. Paste the data over the inpainted region (generated in the previous step). In addition, apply augmentation techniques (e.g., perspective transformations, and shadowing) to simulate real-world aging and engraving. Leveraging LLMs tailored for German and Hebrew scripts will help OCR systems better generalize to the unique linguistic and typographical characteristics of the dataset.
- Exchanging the text engravings on gravestones in the original dataset: Pasting the text corpus from one gravestone to the inpainted area of another gravestone. This method ensures that the synthetic data maintains a realistic appearance, as the text preserves its historical and linguistic characteristics while adapting to a new gravestone’s material properties and background information.
- Generating gravestones using LLMs: Leveraging the use of Large Language Models to generate the complete gravestone pictures by giving examples of the original dataset. This will introduce entirely new examples, capturing varied inscription styles, material textures, and environmental effects. This variability in the synthetic gravestones allows for a systematic comparison of OCR performance across different types of generated data, providing insights into the impact of dataset diversity on OCR accuracy.
- OCR and Text Transcription: The step involves taking the dataset generated by synthetic methods and performing the OCR and text transcription techniques on it. And further analyze how different data synthesis techniques affect OCR performance by calculating metrics such as Levenshtein Distance, Character Error Rate (CER) and Word Error Rate (WER).
References
- John Hindmarch, Mona Hess, Miroslavas Pavlovskis1, Maria Chizhova. Application of Multicriteria Decision Making for the Selection of Sensing Tools for Historical Gravestones. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42(9):1435–1442, 2020.
- Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng lin Liu, Lianwen Jin, and Xiang Bai. OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models, 2024.
- Keith Man and Javaan Chahl. A Review of Synthetic Image Data and Its Use in Computer Vision. MDPI, 8, 11 2022.
- Mandeep Goyal and Qusay H. Mahmoud. A Systematic Review of Synthetic Data Generation Techniques Using Generative AI. MDPI, 13, 09 2024.
- Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Tianfan Fu, and Wenqi Wei. Machine Learning for Synthetic Data Generation: A Review. 2024.
On-Device Training for Face Identification
Pathological Voice Analysis with Selective State Space Models
Identifying predictive brain regions in fMRI data for drug responders vs. non-responders, using a foundation model in comparison to the classical GLM method
[Thesis] Reinforcement Learning for 110 kV Distribution Grid Restoration in Blackout situation
Background
Restoring power in a 110 kV distribution grid after a blackout involves complex, sequential decisions under strict operational constraints (e.g. voltage, frequency). Traditional rule-based approaches lack flexibility for unexpected scenarios and operational experience.
Challenge
To support operators in real time, reinforcement learning must evaluate safe, interpretable actions within milliseconds. Key requirements include integration with a pre-existing grid restoration simulator (including test grid) built into PowerFactory, as well as strict adherence to stability limits.
Tasks
– Model a reinforcement learning environment for the restoration process
– Incorporate operational constraints
– Implement and train an RL agent (e.g., DQN, PPO)
– Evaluate agent performance (success rate, stability violations)
– Visualize and interpret agent decisions for transparency
– (Optional) Integrate safety mechanisms (shielded learning)
– (Optional) Benchmark inference speed for real-time GridAssist
Requirements
– Basic knowledge in electrical engineering
– Understanding of electrical power systems (PowerFactory)
– Advanced programming skills (Python)
– (basic/advanced) knowledge of reinforcement learning
– Fluent in English
Start: July 15th
End: Jan 15th
Type: Master research project or thesis
Language: English
Contact: Changhun Kim(changhun.kim@fau.de), Simon Linnert(simon.linnert@fau.de)
Application:
Please apply by email with the subject line “[RL-Restoration Project Application 2025]”.
Include your CV and transcript of records (grade overview). Applications without these documents will not be considered.
In the body of the email, briefly describe (approx. 100 words) why you are interested in this specific project and how your background prepares you for it.
References:
[1] G. Kordowich, M. Jaworski, T. Lorz, C. Scheibe, and J. Jaeger, “A hybrid protection scheme based on deep reinforcement learning,” in Proc. IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe), pp. 1–6, 2022. DOI: 10.1109/ISGT-Europe54678.2022.9960539.
[2] X. Chen, Y. Xu, and P. Zhang, “Deep reinforcement learning for distribution system restoration with DER coordination,” IEEE T. Smart Grid, vol. 13, no. 2, pp. 987–999, 2022.
[3] J. Ding, H. Wang, and S. Low, “Safe policy gradient for microgrid black-start restoration,” in Proc. IEEE PES GM, 2024.
[4] H. Liu et al., “Explainable reinforcement learning: A survey,” ACM Comput. Surveys, vol. 55, no. 7, pp. 1–38, 2023.
[5] R. Li, T. Liu, J. Yu et al., “Graph neural network based voltage-control reinforcement learning for distribution systems,” IEEE T. Smart Grid, vol. 12, no. 6, pp. 5269–5280, 2021.
[6] A. Molina García et al., “Switching impact on MV equipment—a six-year field study,” CIRED Workshop, 2020.
[7] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in AISTATS, 2011.
[8] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proc. ICML, 2017.
[9] M. Alshiekh, R. Bloem, R. Ehlers et al., “Safe reinforcement learning via shielding,” in AAAI, 2018.
[10] M. Du, N. Liu, Q. Hu et al., “Techniques for interpretable deep learning,” Commun. ACM, vol. 63, no. 1, pp. 68–77, 2020.
Safe_RL_Power_Grid_restoration
Layer-wise Analysis of Belief Representations in Transformer Language Models
Interpretable Vision Transformers with Attention Maps for Phonological Precision Assessment from MRI
Deep Learning-Based Defect Detection and Classification in CdTe CT Detector Wafers
Photon-counting CT (PCCT) is a modern technology that achieves higher resolution and lower radiation dose using CdTe (Cadmium Telluride) detectors, which directly convert X-ray photons into electrical signals, unlike conventional CT systems with silicon-based indirect conversion.
At Siemens Healthineers, CdTe detectors are made from processed wafers and inspected using infrared (IR) transmission imaging. A technician uses a software tool to manually label defects such as cracks, tellurium inclusions, grain boundaries, and patterns like “Milky Way” or “Twins.” This inspection process is time-consuming, subjective, and not scalable for high-volume production.
Deep learning models such as ResNet and YOLO [1] have demonstrated strong wafer defect classification and localization performance, particularly on synthetic datasets like WM-811K [2]. However, these methods are primarily designed for structured layouts and silicon wafers, which differ significantly from the CdTe-based wafers used in photon-counting CT detectors. While studies such as Kirschenmann et al. [3] have applied deep learning to CdTe crystals using IR microscopy, their focus has been on crystal characterization rather than surface defect detection in a production setting, which is the focus of this thesis.
To the best of our knowledge, no existing method addresses the automatic classification of wafer surface defects using IR images from real CdTe wafer production, which is the focus of this thesis. The aim is to develop a deep learning model to automatically detect and classify surface defects in IR images of CdTe wafers for faster, more consistent, and scalable inspection. The dataset consists of high-resolution IR images with pixel-wise labeled masks, where defects are annotated using color codes. Each defect class includes at least 1,000 labeled images, covering types such as cracks, tellurium inclusions, and other surface anomalies.
The key steps involved in this work are:
- Literature Review A review of deep learning approaches for wafer defect detection will be conducted, with a focus on models that combine classification and localization, such as the ResNet- and YOLO-based framework proposed by Shinde et al. [2]. Relevant work on IR imaging and neural network applications for CdTe materials will also be considered.
- Model Design and Implementation A deep learning architecture will be developed to classify and potentially localize surface defects. Preprocessing steps will be designed based on the format and structure of Siemens’ internal dataset.
- Evaluation The model will be tested using standard evaluation metrics and compared to known methods. The goal is to assess how well it performs and whether it can be useful in real production environments.
References
[1] Shinde, M. et al. (2023). Wafer Defect Localization and Classification Using Deep Learning Techniques.
[2] WM-811K Dataset. MIT Lincoln Laboratory: Wafer Map Defect Dataset. https://www.ll.mit.edu/r-d/datasets/wm-811k-wafer-map-defect-dataset
[3] Kirschenmann, D. et al. (2023). Employing infrared microscopy in combination with a pre-trained neural network to visualize and analyze the defect distribution in cadmium telluride crystals.
Evaluation of Forecasting Approaches for Time Series Data in the Utility Domain
Time series data is a sequence of data points over time that allows to understand the evolution of a system by analyzing the trends and influencing variables. It serves as the foundation for time series forecasting, where historical patterns, trends, and seasonal variations are analyzed to make informed predictions about future values. Time series forecasting plays a crucial role in the utility sector by enabling accurate demand prediction, optimizing energy distribution, and ensuring efficient resource management, helping providers balance supply and demand while minimizing operational costs and outages.
A significant challenge in time series forecasting lies in the behavioral heterogeneity of the data. Each time series exhibits unique characteristics, making a one-size-fits-all approach inadequate. With respect to the utility sector, this is particularly evident in data from residential, agricultural and industrial zones, where distinct consumption patterns necessitate different forecasting models. This research seeks to address this challenge by determining the best-suited forecasting model that accounts for the unique characteristics of the time series data.
This thesis aims to evaluate time series forecasting approaches for water and heat utility networks and identify the most suitable forecasting models for different time series data with unique demand and usage patterns. The goal is to classify time series data into distinct categories by their characteristics to find and apply the best forecasting model for each. To achieve this, we will leverage statistical methods, Machine Learning (ML), and Deep Learning (DL) techniques, which offer advanced capabilities for handling non-linearities, automatically extracting features, managing large datasets, and capturing complex dependencies. Forecasting approaches are chosen based on their nature of input and the nature in which the data is processed. Popular approaches include statistical methods, frequency-aware techniques [1], machine learning algorithms [2], recurrent neural networks [3], transformer-based architectures [4], and emerging foundation models [5]. The research will involve categorizing time series data, training and testing different ML and DL models, and evaluating their performance based on various metrics to determine the most suitable model for each category.
The anticipated outcome of this thesis is the creation of a robust framework for classifying time series data by their unique characteristics and identifying the most suitable forecasting models for each category. This research aims to:
- Classify time series data based on distinct attributes.
- Conduct a comprehensive evaluation of various ML and DL models tailored to each data category.
- Improve the precision of demand forecasts for water and heat networks through customized prediction models.
By determining models tailored to the specific characteristics of water and heat consumption data, this research proposal aims to identify the most suitable approach for a given time series and evaluate how they perform relative to each other leading to significantly improving the accuracy of future demand predictions, facilitating better resource management and infrastructure planning.
References:
[1] P. C. Young, D. J. Pedregal, and W. Tych, “Dynamic harmonic regression,” J Forecast, vol. 18, no. 6, pp. 369–394, Nov. 1999, doi: 10.1002/(SICI)1099-131X(199911)18:6<369::AID-FOR748>3.0.CO;2-K.
[2] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Mar. 2016, doi: 10.1145/2939672.2939785.
[3] M. Beck et al., “xLSTM: Extended Long Short-Term Memory,” May 2024, Accessed: Jun. 16, 2025. [Online]. Available: https://arxiv.org/pdf/2405.04517
[4] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A Time Series is Worth 64 Words: Long-term Forecasting with Transformers,” 11th International Conference on Learning Representations, ICLR 2023, Nov. 2022, Accessed: Jun. 16, 2025. [Online]. Available: https://arxiv.org/pdf/2211.14730
[5] V. Ekambaram et al., “Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series,” Jan. 2024. Available: https://arxiv.org/pdf/2401.03955
This thesis is part of the “UtilityTwin” project.