Index
Self-Supervised Dual-Domain Swin Transformer for Sparse-View CT Reconstruction
The resolution of medical images inherently limits the diagnostic value of clinical image acquisitions. Obtaining high-resolution images through tomographic imaging modalities like Computed Tomography (CT) requires high radiation doses, which pose health risks to living subjects.
The main focus of this thesis is to develop a unified deep learning pipeline for enhancing the spatial resolution of low-dose CT scans by refining both the sinogram (projection) domain and the reconstructed image domain. Leveraging the Swin Transformer architecture, the proposed approach aims to generate high-resolution (HR) scans with improved anatomical detail preservation, while significantly reducing radiation dose requirements.
Deep learning-based boundary segmentation for the detection of a retinal biomarker in volume-fused high resolution OCT
Some of the main causes of vision loss are eye diseases such as age-related macular degeneration (AMD), diabetic retinopathy and glaucoma. Detecting these conditions early is critical and one of the main imaging modalities used in ophthalmology is optical coherence tomography (OCT). This thesis uses high resolution OCT images acquired at the New England Eye Center, Boston, MA. Existing motion correction and image fusion methods are used to generate high-quality volumetric OCT data (Ploner et al., 2024).
Building upon this data, this master thesis includes the development of boundary segmentation for multiple retinal layers, with specific focus on the anterior boundary of the ellipsoid zone. Additionally, the segmentation will be integrated in a pipeline for automated quantification of a biomarker.
The main tasks are:
● Evaluation of a promising new architecture for boundary segmentation, with particular consideration given to the Vision Transformer (Dosovitskiy et al., 2020)
● Development and evaluation of a method for automated quantification of an eye disease biomarker based on the segmented boundaries
Special attention will be given to the following aspects:
● Label efficiency, achieved either through task-specific pretraining or by utilizing a relevant foundational model, such as those proposed by Morano et al. (2025)
● Utilization of 3D data
The resulting model will be compared with the ground truth of the held-out test set. In addition, it will be evaluated against existing U-Net based boundary regression methods, such as those from He et al. (2019) and Karbole et al. (2024). The evaluation uses common regression metrics such as mean squared error (MSE), mean absolute error (MAE) and root mean squared error (RMSE).
The aim of this thesis is to contribute a model for the segmentation of retinal layer boundaries in OCT images, laying the groundwork for the automated quantification of a biomarker for AMD. This thesis shall provide a step towards earlier diagnosis, better monitoring of disease progression and improved clinical workflows.
References
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020, October 22). An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv.org.
He, Y., Carass, A., Liu, Y., Jedynak, B. M., Solomon, S. D., Saidha, S., Calabresi, P. A., & Prince, J. L. (2019). Fully convolutional boundary regression for retina OCT segmentation. Lecture Notes in Computer Science, 120–128.
Morano, J., Fazekas, B., Sükei, E., Fecso, R., Emre, T., Gumpinger, M., Faustmann, G., Oghbaie, M., Schmidt-Erfurth, U., & Bogunović, H. (2025, June 10). MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis. arXiv.org.
Karbole, W., Ploner, S. B., Won, J., Marmalidou, A., Takahashi, H., Waheed, N. K., Fujimoto, J. G., & Maier, A. (2024c). 3D deep learning-based boundary regression of an age-related retinal biomarker in high resolution OCT. In Informatik aktuell (pp. 350–355).
Ploner, S. B., Won, J., Takahashi, H., Karbole, W., Yaghy, A., Marmalidou, A., Schottenhamml, J., Waheed, N. K., Fujimoto, J. G., & Maier, A. (2024, May 5–9). A reliable, fully‑automatic pipeline for 3D motion correction and volume fusion enables investigation of smaller and lower‑contrast OCT features [Conference presentation]. Investigative Ophthalmology & Visual Science, 65(7), ARVO E‑Abstract 2794904.
Modernizing and Extending miRNexpander: A Web-Based Interface for Network Expansion of Molecular Interactions in Biomedical Research
Deep Learning-Based Breast Cancer Risk Stratification Using Multiple Instance Learning on LDCT Scans
Analysis of Speech Production Assessment of Cochlear Implant Users
PaiChat: A Visual – Language Assistant for Histopathology
Evaluating Urban Change Detection and Captioning in Remote Sensing
Artificial Data Generation and OCR Processing for Improved Analysis of Jewish Gravestone Inscriptions
Thesis Description
Digital preservation and automated analysis of historical inscriptions are essential for understanding cultural heritage. Jewish gravestone inscriptions provide rich insight into historical, religious, and genealogical records. However, these inscriptions often suffer significant degradation due to age, environmental exposure, and the material composition of the gravestones. The challenge is compounded by inscriptions written in multiple languages, like German and Hebrew, with unique characters and complex typographical structures [1]. Traditional OCR (Optical Character Recognition) systems frequently struggle to accurately transcribe degraded text or non-standard layouts, especially when faced with limited training data. The scarcity of labeled high-quality training data is a major bottleneck, making it difficult for OCR models to generalize well to new or unseen inscriptions. Synthetic data generation plays a crucial role in overcoming these limitations by creating realistic but artificial training examples that include known ground truth, and significantly expanding datasets, which further has the potential to improve OCR performance [2, 3].
The generation of synthetic data can simulate various conditions, such as weathering, unique inscription layouts, and multi-lingual complexities. This capability enables machine learning models to train on data that would otherwise be costly or impossible to obtain. By leveraging advancements in contemporary deep learning, particularly GANs, VAEs, and large language models, it is now possible to create high-fidelity, annotated images that closely mimic real-world data [1, 4, 5]. Such synthetic datasets not only address issues of data scarcity but also introduce controlled variability that can help OCR models better handle complex visual and linguistic scenarios [3, 5].
This research aims to utilize these synthetic data generation methods to advance the transcription of complex Jewish gravestone inscriptions. By combining inpainting techniques and machine-generated text overlays, as discussed in Methodology below, the thesis will provide a framework that enhances current OCR capabilities and ensures more comprehensive and reliable digitization of historical inscriptions [1, 2].
Research Objectives
The primary objective of this study is to develop a pipeline that leverages synthetic data generation and advanced OCR techniques to transcribe inscriptions on Jewish gravestones more effectively. The key research goals are:
- Synthetic Data Generation: Create synthetic gravestone images with realistic inscriptions and reliable ground truth for training and evaluation purposes. This includes using inpainting methods and employing generative models to simulate aging effects, unique typography, and multilingual inscriptions [3, 5].
 - OCR Enhancement: Synthetic data allows the creation of large, diverse datasets. These data of varied text styles, fonts, and layouts increase the training corpus, improving the OCR model’s generalization capabilities. Hence, it could help enable better training, testing, and also OCR performance.
 - Evaluation of Data Synthesis Impact: Analyze how different data synthesis techniques affect OCR performance by calculating metrics such as Levenshtein Distance, Character Error Rate (CER), Word Error Rate (WER), and further focusing on improvements in recognizing and transcribing of worn, complex inscriptions [1, 5].
 
Research Challenges
Transcribing Jewish gravestone inscriptions poses several unique challenges:
- Material and Inscription Variability: The gravestones exhibit different inscription methods (e.g., carved, etched, metalwork), contributing to variability in text legibility [4].
 - Environmental and Lighting Conditions: Shadows, reflections, and varying camera angles further hinder the transcription process by altering the appearance of the text [2, 4].
 - Multi-lingual Nature: German and Hebrew scripts have distinct characteristics that require specialized language models. Inscriptions often include named entities and unique historical terms that challenge standard OCR models [3, 5].
 - Limited Dataset Availability: The number of high-quality, annotated images of gravestones is minimal, necessitating the creation of synthetic data to bolster training datasets [4].
 
Methodology
- Data Synthesis:
- Manual Segmentation for Ground Truth Creation: Segment gravestone areas in images manually to generate reliable ground truth masks, addressing limitations of automated segmentation methods as it has comparatively higher accuracy for complex and degraded inscriptions and also provides reliable ground truth creation.
 - Advanced Inpainting Techniques: Implement GANs and diffusion models to remove existing text from gravestones, creating a clean base image for text overlay.
 - Applying different synthetic data generation methods:
- Overlaying Synthetic Text: Generate synthetic inscriptions using LLMs tailored for German and Hebrew scripts. Paste the data over the inpainted region (generated in the previous step). In addition, apply augmentation techniques (e.g., perspective transformations, and shadowing) to simulate real-world aging and engraving. Leveraging LLMs tailored for German and Hebrew scripts will help OCR systems better generalize to the unique linguistic and typographical characteristics of the dataset.
 - Exchanging the text engravings on gravestones in the original dataset: Pasting the text corpus from one gravestone to the inpainted area of another gravestone. This method ensures that the synthetic data maintains a realistic appearance, as the text preserves its historical and linguistic characteristics while adapting to a new gravestone’s material properties and background information.
 - Generating gravestones using LLMs: Leveraging the use of Large Language Models to generate the complete gravestone pictures by giving examples of the original dataset. This will introduce entirely new examples, capturing varied inscription styles, material textures, and environmental effects. This variability in the synthetic gravestones allows for a systematic comparison of OCR performance across different types of generated data, providing insights into the impact of dataset diversity on OCR accuracy.
 
 
 
- OCR and Text Transcription: The step involves taking the dataset generated by synthetic methods and performing the OCR and text transcription techniques on it. And further analyze how different data synthesis techniques affect OCR performance by calculating metrics such as Levenshtein Distance, Character Error Rate (CER) and Word Error Rate (WER).
 
References
- John Hindmarch, Mona Hess, Miroslavas Pavlovskis1, Maria Chizhova. APPLICATION OF MULTICRITERIA DECISION MAKING FOR THE SELECTION OF SENSING TOOLS FOR HISTORICAL GRAVESTONES. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42(9):1435–1442, 2020.
 - Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng lin Liu, Lianwen Jin, and Xiang Bai. OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models, 2024.
 - Keith Man and Javaan Chahl. A Review of Synthetic Image Data and Its Use in Computer Vision. MDPI, 8, 11 2022.
 - Mandeep Goyal and Qusay H. Mahmoud. A Systematic Review of Synthetic Data Generation Techniques Using Generative AI. MDPI, 13, 09 2024.
 - Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Tianfan Fu, and Wenqi Wei. Machine Learning for Synthetic Data Generation: A Review. 2024.