Index
Improving OCR for Structured Documents using Domain Knowledge
Thesis Description
The digitization of inventory cards is a recurring issue for museums and university collections. These cards hold structured data organized by individual layouts that need to be preserved when digitized. Optical Character Recognition (OCR) can be used for pure text recognition but struggles with structured content: The recognition accuracy decreases due to missing textual context and it lacks interpretation of the structured layout.
The goal of this thesis is to build a human-supported layout analysis for enabling OCR pipelines to convert inventory cards to structured data. The research aims to investigate whether OCR accuracy can be improved by incorporating prior knowledge regarding the structure and content of text fields.
Mandatory Goals:
- Design UI Application with following capabilities:
- Card layout definition for template matching data fields
- Detection and correction of minor shifts and rotations
- Run OCR / Image Extraction and export to structured data (e.g. csv)
- (Semi-)manually annotate data set for testing and fine-tuning (ca. 100 validation / 500 training set size)
- Fine-tune one OCR pipeline on training samples + evaluate on validation split (baseline)
- Re-train OCR pipeline with additional data type information (int, float, string) + evaluate in comparison to baseline
- Additional approach: Baseline OCR with postprocessing steps
- Rule-based: ensure data consistency by category-specific rules, e.g. normalizing to default unit for weights (“12” -> “12 g”)
- LLM: Query ChatGPT / lab-internal LLM with OCR result and expected output data type, request correction of OCR output
Optional Goals:
- Add other OCR pipelines for baseline comparison
- Introduce and train for more specific data types: weight, date, currency, dimensions
- Test different feature fusion approaches for incorporating the data type information
- Compare to LLM-based approach to OCR as additional baseline
Real-time Path Loss Prediction Using Deep Learning for Smart Meter Communication System
Path Loss and Deep Learning:
Path loss measures the reduction in signal power between a transmitter and receiver. It plays a crucial role in determining the coverage and reliability of communication networks, especially in smart meter wireless communication systems. Therefore, accurate path loss prediction is essential for designing efficient networks and ensuring reliable data transfer. Traditional methods, such as empirical models [1] and deterministic approaches [2], when predicting path loss, faces limitations in generalizing across environments or suffer from high computational complexity. In contrast, machine learning and deep learning-based methods [3][4][5][6] offer a promising balance between accuracy and computational efficiency.
Thesis Outline:
This thesis aims to advance path loss prediction in smart meter systems, focusing specifically on LPWAN [7] communication technologies. While prior research has made significant strides, advancements in technology provide opportunities to improve model accuracy, reduce complexity, and enhance versatility. The primary goal is to develop and implement a robust deep learning model for real-time path loss prediction in a fixed-area network using existing smart meter data. Consequently, the outcomes of this work will be valuable for optimizing network management and enhancing the reliability of communication in smart meter deployments.
This thesis for path loss prediction using deep learning will cover the following aspects:
- Conducting a comprehensive literature review,
- Developing and implementing a machine learning or deep learning model for path loss prediction in a fixed network smart meter scenario, using available meter data,
- Thoroughly evaluating the model’s performance in terms of accuracy, robustness, and real-time prediction capabilities.
References:
- Singh, Yuvraj. (2012). Comparison of Okumura, Hata and COST-231 Models on the Basis of Path Loss and Signal Strength. International Journal of Computer Applications. 59. 37-41. 10.5120/9594-4216.
- M. Ayadi, “A UHF Path Loss Model Using Learning Machine for Heterogeneous Networks,” IEEE Transactions on Antennas and Propagation, 2017.
- Y. Zhang, “Path Loss Prediction Based on Machine Learning: Principle, Method, and Data Expansion,” Applied Sciences, 2019.
- D. Wu, “Application of artificial neural networks for path loss prediction in railway environments,” in In Proceedings of the 2010 5th International ICST Conference on Communications and Networking, Beijing, China, 2010.
- I. Popescu, “ANN prediction models for indoor environment,” in In Proceedings of the 2006 IEEE International Conference on Wireless and Mobile Computing, Networking and Communications, Montreal, QC, Canada, 2006.
- F. Schneider, “Lokalisierung Smarter Zähler – Masterarbeit im Studiengang Informatik, Technische Hochschule Nürnberg” Nürnberg, 2021.
- B. S. Chaudhari, LPWAN Technologies for IoT and M2M Applications, 2020.
Evaluating the Performance of GAMS for Predicting Mortality Compared to Traditional Scoring Systems
External Supervision: Mr. Lasse Bohlen (University of Leipzig)
This master’s thesis provides a critical review of the applicability of Generalized Additive Models (GAMs) for mortality prediction in clinical practice and comparisons of GAMs with established scoring systems. The study also evaluates whether using GAMs can increase the rate of accurate prediction and, hence, improve the health decision-making process and subsequent patient care.
SAPS and APACHE are traditional scoring systems used in health care to determine mortality prognosis. However, they have yet to gain much understanding due to their stiffness and the conditions imposed on them. Such models often incorporate explicit variables and linear dependency, while patients’ data involves many interactions and non-linearity. [1]
GAMs are a more flexible alternative that permits the utilization of non-linear relations and interactions between covariates. Therefore, they are a very useful tool for providing new insights into clinical data patterns [2]. This thesis focuses on the modeling background of GAMs and investigates their predictive capability by applying them to clinical databases.
Research Objectives:
- To investigate the ability of the GAMs to achieve a more accurate prediction of clinical mortality compared to the traditional scoring systems.
- In this cross-sectional study, the difference in statistical accuracy and clinical applicability of GAMs will be discussed.
- To offer the principles for the application of GAMs in the clinical setting.
Methodology:
The procedure includes building the new algorithm, which consists of GAMs, with the help of clinical databases and comparing it with the existing predictive scoring systems. Hence, evaluating models will involve using the auc-roc score, a statistical metric [3]. The study will employ the MIMIC-III clinical dataset [4].
Anticipated Impact:
Hence, by showing the utility of GAMs in enhancing mortality prediction, this study seeks to influence better and more personalized patient care approaches. As such, the findings inform the next research steps and the integration of higher-order prognostic models in the healthcare context.
References:
[1] Reza Sadeghi, Tanvi Banerjee, and William Romine. “Early hospital mortality prediction using vital signals”. In: Smart Health 9-10 (2018). CHASE 2018 Special Issue, pp. 265–274. ISSN: 2352-6483. DOI: https://doi.org/10.1016/j.smhl.2018.07.001. URL: https://www.sciencedirect.com/science/article/pii/S2352648318300357.
[2] Shima Moslehi et al. “Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran”. In: BMC Med Res Methodol 22.1 (2022), p. 339. DOI: 10.1186/s12874-022-01827-y.
[3] Shangping Zhao et al. “Improving Mortality Risk Prediction with Routine Clinical Data: A Practical Machine Learning Model Based on eICU Patients.” In: Int J Gen Med 16 (2023). PMID: 37525648; PMCID: PMC10387249, pp. 3151–3161. DOI: 10.2147/IJGM.S391423.
[4] Rui Liu et al. “Predicting in-hospital mortality for MIMIC-III patients: A nomogram combined with SOFA score”. In: Medicine (Baltimore) 101.42 (2022), e31251. DOI: 10.1097/MD.0000000000031251.
Speech Emotion Recognition Demo
If you are interested, please send an email with your transcripts to paula.andrea.perez@fau.de with the subject SERDemo LME. The students should have knowledge of Python coding and GUI toolkits such as PyQt.
SwinU: A Swin Transformer-Based Model for CT Image Restoration
Survey on Image Segmentation and Concise Introductions to DeepMedic
Multicenter Study of Brain Metastases Autosegmentation
Image-to-Image Translation Using Latent Diffusion Models
themaAutomated Configuration of U-Net Architecture for Medical Image Segmentation
In this research, we will explore how different U-Net hyperparameters impact on segmentation performance in multiple medical segmentation datasets. By combining the frameworks of MONAI and nnU-Net, the study will investigate how to effectively adjust relevant hyperparameters of U-Net to optimize the model ‘s performance in different medical image segmentation tasks. Specifically, we will focus on analyzing the impact of hyperparameters such as network depth, convolution kernel size, learning rate and data augmentation strategies on segmentation performance based on U-Net architecture, and validate the effectiveness of these hyperparameters settings per experiment. Ultimately through systematic research and experiments, we aim to provide a more efficient and highly generalizable U-Net model configuration scheme for medical image segmentation tasks.
The purpose of this study is to explore and optimize the hyperparameter configuration of the U-Net model architecture to improve the performance in various medical image segmentation tasks, such as binary and multi-class medical image dataset segmentation. Through systematic experiment and analysis, we will seek to gain a deep undetstanding of how different hyperparameter settings impact on the result of image segmentation, thereby providing more efficient and generalizable solutions for medical image segmentation tasks. The potential outcomes of this research will not only improve accuracy and precision of image segmentation but also provide valuable references and support for researchers in relevant fields.