Index

Master Thesis – Annotation by Speech in Radiology

This thesis explores using speech as a direct annotation modality for medical image analysis, bypassing transcription errors and enabling more lightweight models. By training a foundation model like CLIP, we aim to investigate how well speech-based annotations perform compared to text.

Tasks:

  1. Generate a synthetic speech dataset based on a publicly available image-text dataset
  2. Train a foundation model (CLIP) for annotating medical images using speech annotations
  3. Evaluation of the foundation model on multiple downstream tasks like:
    • Zero-shot classification
    • Zero-shot segmentation using MedSam
    • Speech Grounding (align language with corresponding visual elements, e.g. segmentation masks)
  4. Evaluation of the model on a real-world high-quality dataset from radiologists
  5. Compare the results to an image-text model

 

Requirements:

  • Experience with PyTorch
  • Hands-on experience with training deep learning models
  • Experience with Natural Language Processing (optional)
  • Experience with using SLURM for job management in a GPU cluster (optional)
  • Deep Learning lecture
  • Pattern Recognition/Analysis lecture (optional)

 

Application: (Applications that do not follow the application requirements will not be considered)

  • CV
  • Transcript of Records
  • Short motivation letter (not longer than one page)
  • Email Subject: “Application Speech-CLIP” + your full name

Please send an email with your application documents to lukas.buess@fau.de

 

Starting Date: 01.01.2025 or later

Related Works:

[1] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

[2] Hamamci, I. E., Er, S., Almas, F., Simsek, A. G., Esirgun, S. N., Dogan, I., … & Menze, B. (2024). Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography.

[3] Ma, J., He, Y., Li, F., Han, L., You, C., & Wang, B. (2024). Segment anything in medical images. Nature Communications, 15(1), 654.



Diffusion Model-Based Compensation of T2-induced Blurring in Ultrashort TE MRI

How Broken a Coil Must Be?

Investigating Liquidity Forecasting with Point-Based and Probabilistic Models to Enhance Financial Business Operations

Enhancing SBOM Creation with Large Language Models

Producing Synthetic Data for Better Defect Detection

Can the training workflow and performance of CNNs for defects detection be improved by buttressing the training database with synthetic images of defective parts generated using GANs?

In the manufacture of high-end cinematographic lenses, a careful inspection needs to be performed on anodized aluminum housings of lenses to look for scratches on the surface of cases before they are delivered to the customers. To automatize this inspection process, traditional image processing algorithms are very limited because the anodized aluminum surface has a granular surface, when it is observed carefully. In order to provide a more general solution yet a robust result, a convolutional neural network (CNN) is used to carry out this inspection. However, training CNNs is challenging due to the low amount of available defect images.

To overcome these challenges, the proposed master’s thesis will investigate the use of Generative Adversarial Networks (GANs) to generate synthetic data for the minority classes for CNN training as suggested in the original paper of GANs. Those synthetic images from GANs will then be used to train a CNN for defect detection and the performance of the newly trained CNN will be measured by its classification accuracy on samples from real world.

The proposed research will involve three stages:

  1. Finalizing optimizations to the hardware of the laboratory setup

  2. do some basic testing to decide which of the following GAN approaches will be taken

    1. Transparent defect image

In this approach, scratches will be generated through GANs. The synthetic scratches from the GANs will be overlayed to different real scratch-free surface images to enlarge the training dataset for the defect detection CNN.

    1. Patch based/Inpainting approach

In this approach, a GAN which can generate patches of surfaces with scratches will be trained. The patches generated need to be able to be seamlessly integrated into a real surface. The approach is similar to the idea of Inpainting, with the masked area being filled by synthetic scratches instead of visually realistic pixels.

    1. Image-to-image translation

In this approach, a GAN will either turn a defect map(2D array indicating the desired locations of scratches) into a synthetic surface with scratches, or a real surface with scratches into a defect map. The second approach shows some resemblances to the task of semantic segmentation in the paper .

  1. Training a CNN with a dataset generated from GAN and evaluate performance of the trained CNN with data unknown to the CNN, acquired from a variety of aluminum housing geometries and the laboratory setup with different settings.

[1]

Goodfellow, I., Pouget-Abadie, J., Mirza, M. et al., „Generative Adversarial Networks,“ Communications of the ACM, Bd. 63, Nr. 11, pp. 139-144, 2020.

[2]

Pathak et al., „Context encoders: Feature learning by Inpainting,“ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[3]

P. Isola, J.-Y. Zhu, T. Zhou und A. Efros, „Image-to-image translation with conditional adversarial networks,“ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Optimizing Remote Scanning Workflow with Automated Data Extraction from Streamed Content

Remote scanning allows medical technologists to operate scanning systems from a remote location. Scanners are connected via KVM-switches (Keyboard, Video, Mouse), as there is no other interface standard existing. However, KVM only streams video and control signals, without transmitting structured data like scan progress or warnings from the scanner. This limitation creates a significant challenge for remote operators, may miss important system events or updates, leading to potential miscommunication with onsite staff. This can result in delays in patient care and depend on the onsite team to monitor patient conditions effectively during scans.

Final Goals:
Prototype Methods to capture data from scanning console video streams in real time such as-
a) Error or Warning popup messages in different languages, from different interfaces
b) Calculate process progression from the progress bar.

Sign Language Recognition Using Transformer and Comparison with Traditional Techniques

This thesis is about creating a system to recognize sign language using transformer networks and
comparing it with older methods. The aim is to build a system that is both effective and accurate by
using transformer models, which are good at handling sequences of data, to understand and interpret
sign language. The study will include collecting data, preparing it, training models, evaluating them, and
comparing the results with traditional methods like CNNs.

The main idea of this thesis is to use transformer networks for recognizing sign language. Unlike
traditional models that process data step-by-step, transformers can handle entire sequences at once,
which improves understanding and accuracy. The system will use different types of data (e.g., video) to
be more robust and accurate. This research will compare transformers with traditional methods like
CNNs to show the benefits and possible improvements of transformers in sign language recognition.

Predictive Maintenance for Electrical Panels: Hotspot Forecasting and Anomaly Detection Using Thermal Data

This thesis proposes the development of a machine learning-based predictive maintenance
system for electrical panels using thermal data. The system will address the limitations of periodic manual
inspections and enable the detection of anomalies in the operation of electrical devices. By leveraging real-
time thermal data and applying machine learning techniques, the solution aims to enhance the
sustainability and efficiency of maintenance processes, especially in environments like airports where
baggage handling systems (BHS) are critical. This project proposes the use of a 32×32 thermopile sensor
array to collect continuous thermal data and apply machine learning models to predict potential failures
before they occur.

he thermal dataset is provided by Siemens Logistics GmbH.

Problem Statement: Manual thermographic inspections in electrical panels have several limitations,
including the inability to coincide with peak operating times and the reliance on operator expertise to
interpret infrared images. Furthermore, the inability to continuously monitor and analyze the thermal
behavior of electrical panels leads to missed opportunities for early intervention and predictive
maintenance.

Expected Outcomes: The primary outcome of this thesis will be a system that predicts when anomalies
are likely to occur within electrical panels. This will result in fewer manual inspections, minimized
downtime by predicting failures, and recommend inspections for timely maintenance.

Normalization of Sensor and Smartphone Gait Signals of Parkinson’s Disease Patients Using Deep Learning