Index

Comparative Study of traditional and Deep learning Binarization Methods for Historical Document Processing

The aim of this thesis is to present a fairer and more practical comparison between traditional and deep learning-based binarization methods. Models chosen from both selections will be applied to historical document datasets and evaluated against the ground truth, focusings on their practical application, measuring their impacts on OCR performance and binarization quality, and developing a transparent and usable framework for the evaluation and comparison of binarization methods. This will make it possible to compare each method’s results directly in terms of text recognition quality.

Experimental Setup and Resources

Traditional methods:
Otsu and Sauvola.

Deep Learning-Based Models:

  • SAE (Selectional Autoencoders): to evaluate how well a compact encoder-decoder model learns pixel-level binarization for historical documents DeepOtsu: A U-Net-based enhancement model followed by Otsu thresholding for final binarization.
  • ROBIN (U-Net Variant): A representative of U-Net-based segmentation models to assess their performance in direct binary mask prediction.

Datasets

  • DIBCO (2009–2022): Benchmark dataset comprising printed and handwritten degraded documents
  • HisDB: Historical manuscript datasets with realistic degradation patterns. Technologies and Tools

Evaluation Metrics:

  • F-measure,
  • PSNR,
  • DRD,
  • NRM,
  • and OCR accuracy for practical evaluation.

Milestones
1. Literature Review: Study existing binarization techniques mentioned in the models
sections, particularly focusing on their application to historical documents.
2. Dataset Preparation: Collect and preprocess publicly available datasets. Align ground
truth masks and normalize formats for consistent evaluation
3. Model Implementation and Integration: Implement or modify binarization models
using PyTorch; create a single pipeline that includes both deep learning and traditional
models.
4. Evaluation and Comparison: Compare results with common metrics (F-measure,
PSNR, DRD, NRM) and visually compare outputs to quantitative results. The binarized
images can then be fed into an OCR system to evaluate the performance of the OCR
5. Analysis and Interpretation Discuss the strengths and limitations of each technique for
different types of degradation.
6. Documentation and Reporting: Compile all documentation (results and analysis) into a
report.

Unsupervised Image Retrieval for Auction Catalogues

proposal_final

Leveraging data for improved contrastive loss in CXR classification

Scene detection – External

In collaboration with external collaborator.

Aphasia Assessment with Speech and Language

Pre-requirements:

Deep learning

Pattern Recognition/Pattern Analysis

 

Ideally:

Speech and Language Understanding

 

If you are interested contact paula.andrea.perez@fau.de with the subject [SAGI-MT] Aphasia Master Thesis 

Attach your transcripts, CV and a summary of an idea (one paragraph) you have related Aphasia detection with speech and language processing

Automated Testing of LLM-driven Conversational Systems in the In-Car Domain

[MT: Pratik Raut] Advanced Techniques for Base Station Deployment Planning for Localization

Advanced Techniques for Base Station Deployment Planning for Localization

The need for accurate localization of User Equipment (UE) has grown significantly in modern wireless
communication networks. This thesis addresses the problem of optimizing Base Station (BS) placement in
complex environments to enhance localization accuracy. Traditional methods often overlook the impact of
real-world environmental features such as building geometry and user distribution, leading to suboptimal
planning decisions [1].
This research proposes a novel approach that incorporates environmental data and signal propagation
characteristics into the planning process. The methodology involves simulating realistic environments using
raytracing techniques and modeling the network using a GPU-accelerated simulation framework. The goal
is to evaluate localization performance for given layouts and suggest improved deployment strategies [1].
In addition, the work explores a reinforcement learning–based optimization framework, where an intelligent
agent iteratively refines BS positions to minimize localization error. Key factors such as Time-Of-Arrival
(TOA), channel impulse responses, and user positions are leveraged to assess and improve system
performance [1].
The outcomes of this thesis include insights into how BS configurations affect localization in urban or
obstructed areas and a systematic framework for data-driven deployment planning.

 

Main Objectives:

  •  Analyze the impact of building geometry on localization accuracy in complex deployment scenarios.
  • Compare the performance of brute force planning methods with a reinforcement learning–based
    optimization framework for BS placement.

Proposed Steps:

  • Create a 3D building map in Blender to serve as input to Sionna RT, a raytracing software.
  • Place BSs at all candidate locations and implement raytracing–based propagation simulations within
    a GPU-accelerated framework.
  • Design and train a deep reinforcement learning agent to iteratively refine BS positions, minimizing
    localization error.
  • Benchmark localization performance for both brute force and RL-optimized BS layouts.
  • Evaluate and compare deployment configurations against criteria, including positioning accuracy.

 

Reference
[1] J. AlTahmeesschi, M. Talvitie, H. LópezBenítez, H. Ahmadi, and L. Ruotsalainen, “MultiObjective Deep
Reinforcement Learning for 5G Base Station Placement to Support Localisation for Future Sustainable
Traffic,” in Proc. IEEE 97th Vehicular Technology Conference (VTC2023Spring), Florence, Italy, Jun. 2023,
pp. 1–5.

Generative Modeling of Fluence Maps for Radiotherapy Planning

Multi-Task Deep Learning for Parkinson’s Disease: Classification and Severity Estimation via Smartwatch Data

Dual Domain Swin Transformer for Sparse-View CT Reconstruction

The resolution of medical images inherently limits the diagnostic value of clinical image acquisitions. Obtaining high-resolution images through tomographic imaging modalities like Computed Tomography (CT)  requires high radiation doses, which pose health risks to living subjects.

The main focus of this thesis is to develop a unified deep learning pipeline for enhancing the spatial resolution of low-dose CT scans by refining both the sinogram (projection) domain and the reconstructed image domain. Leveraging the Swin Transformer architecture, the proposed approach aims to generate high-resolution (HR) scans with improved anatomical detail preservation, while significantly reducing radiation dose requirements.