Index

Generation of Region-guided Clinical Text Reports from Chest X-Ray Images Using LLMs

Foundation Models for Glacier Segmentation

This thesis aims to evaluate three state-of-the-art foundation models for the task of semantic segmentation,
specifically targeting the segmentation of glaciers calving fronts in SAR1 imagery. Foundation
models are recognized as general-purpose, task-agnostic models that are pre-trained on extensive
datasets, allowing them to be adapted for specific tasks with minimal additional training[1][2][3]. This
research will explore the efficacy of these models when applied to SAR data, which presents unique
challenges due to its complex imaging characteristics. The models selected for this analysis are based
on their performance metrics, methodologies, and the datasets used. To assess the suitability of learned
features for our CaFFe2[4] dataset, the models will be compared quantitatively and qualitatively with
each other and shall be implemented in pytorch. This involves fine-tuning the decoders for calving
front delineation tasks versus only fine-tuning the classifier head within backend frozen features.

• Foundation model 1, DINOv2 [1]: DINOv2, developed by Meta AI, represents a significant
advancement in self-supervised learning for computer vision applications. This model employs
a transformer-based architecture and utilizes a teacher-student training paradigm to facilitate
learning general-purpose visual features without needing labeled data. A critical aspect of
DINOv2 is its emphasis on scaling both the model and the dataset. Unlike previous foundation
models, DINOv2 maintains strict control over data quality and diversity, essential for producing
effective visual representations. For evaluation purposes, we focus on the CaFFe dataset and
assess at least one reported model trained on the ADE20K[5] and Pascal VOC 2012[6] datasets.

• Foundation model 2, Prithvi [2]: Prithvi, developed by IBM and NASA, represents a pioneering
foundational model specifically tailored for geospatial data. This model has been tested
across a variety of Earth observation tasks. It uses Masked Autoencoder technique with a Vision
Transformer architecture. Prithvi leverages multispectral satellite imagery from the Harmonized
Landsat Sentinel-2 (HLS) dataset, which offers high-resolution data suitable for diverse ecological
analyses. The model incorporates statistical factors such as precipitation and temperature,
minimizing bias towards specific landscapes and reducing redundancy across different regions
and time periods. For evaluation, this study will utilize the CaFFe dataset and assess at least one
of the three pre-trained models focused on flood mapping[7], wildfire scar mapping[8], and crop
segmentation[9].

• Foundation model 3, SMLFR [3]: SMLFR3 model is a generative convolutional neural network
designed for analyzing remote sensing data. Like Prithvi, SMLFR uses Masked AutoEncoder
technique, but it is built on a convolutional architecture called ConvNeXt[10], which is an
updated version of traditional ConvNets inspired by transformers and competes well with
transformers regarding accuracy and scalability. In addition, it improves feature representation
during training by applying high-frequency filtering to images. The SMLFR model is trained on
a geographical dataset collected from various sensors, including Sentinel-2, Gaofen, Landsat,
and QuickBird, and contains images from different continents and environments. This study will
evaluate the model on the CaFFe dataset using at least one of the two pre-trained models trained
on the Potsdam2[11] and LoveDA[12] datasets.

References
[1] Maxime Oquab et al. “DINOv2: Learning Robust Visual Features without Supervision”. In:
(2024). arXiv: 2304.07193 [cs.CV]. URL: https://arxiv.org/abs/2304.
07193.
[2] Johannes Jakubik et al. “Foundation Models for Generalist Geospatial Artificial Intelligence”.
In: (2023). arXiv: 2310.18660 [cs.CV]. URL: https://arxiv.org/abs/2310.
18660.
[3] Zhe Dong, Yanfeng Gu, and Tianzhu Liu. “Generative ConvNet Foundation Model With Sparse
Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation”. In:
IEEE Transactions on Geoscience and Remote Sensing 62 (2024), pp. 1–16. DOI: 10.1109/
TGRS.2023.3348479.
[4] N. Gourmelon et al. “Calving fronts and where to find them: a benchmark dataset and methodology
for automatic glacier calving front extraction from synthetic aperture radar imagery”. In:
Earth System Science Data 14.9 (2022), pp. 4287–4313. DOI: 10.5194/essd-14-4287-
2022. URL: https://essd.copernicus.org/articles/14/4287/2022/.
[5] Bolei Zhou et al. “Scene Parsing through ADE20K Dataset”. In: (2017), pp. 5122–5130. DOI:
10.1109/CVPR.2017.544.
[6] Mark Everingham et al. “The pascal visual object classes (VOC) challenge”. en. In: Int. J.
Comput. Vis. 88.2 (June 2010), pp. 303–338.
[7] Derrick Bonafilia et al. “Sen1Floods11: a georeferenced dataset to train and test deep learning
flood algorithms for Sentinel-1”. In: 2020 IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW). 2020, pp. 835–845. DOI: 10.1109/CVPRW50498.
2020.00113.
[8] NASA IBM.Wildfire Scar Mapping Dataset. URL: https://huggingface.co/datasets/
ibm-nasa-geospatial/hls%20burn%20scars.
[9] NASA IBM. Multi-Temporal Crop Segmentation. URL: https://huggingface.co/
datasets/ibm-nasa-geospatial/multi-temporal-crop-classification.
[10] Zhuang Liu et al. A ConvNet for the 2020s. 2022. arXiv: 2201.03545 [cs.CV]. URL:
https://arxiv.org/abs/2201.03545.
[11] BSF Swissphoto. 2D Semantic Labeling Contest – Potsdam. URL: http://web.archive.
org / web / 20080207010024 / http : / / www . 808multimedia . com / winnt /
kernel.htm.
[12] Junjue Wang et al. LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive
Semantic Segmentation. 2022. arXiv: 2110.08733 [cs.CV]. URL: https://arxiv.
org/abs/2110.08733.

Stammering Identification using Large Language Models

Annotation by Speech in Radiology

This thesis explores using speech as a direct annotation modality for medical image analysis.



Diffusion Model-Based Compensation of T2-induced Blurring in Ultrashort TE MRI

How Broken a Coil Must Be?

Investigating Liquidity Forecasting with Point-Based and Probabilistic Models to Enhance Financial Business Operations

Enhancing SBOM Creation with Large Language Models

Producing Synthetic Data for Better Defect Detection

Can the training workflow and performance of CNNs for defects detection be improved by buttressing the training database with synthetic images of defective parts generated using GANs?

In the manufacture of high-end cinematographic lenses, a careful inspection needs to be performed on anodized aluminum housings of lenses to look for scratches on the surface of cases before they are delivered to the customers. To automatize this inspection process, traditional image processing algorithms are very limited because the anodized aluminum surface has a granular surface, when it is observed carefully. In order to provide a more general solution yet a robust result, a convolutional neural network (CNN) is used to carry out this inspection. However, training CNNs is challenging due to the low amount of available defect images.

To overcome these challenges, the proposed master’s thesis will investigate the use of Generative Adversarial Networks (GANs) to generate synthetic data for the minority classes for CNN training as suggested in the original paper of GANs. Those synthetic images from GANs will then be used to train a CNN for defect detection and the performance of the newly trained CNN will be measured by its classification accuracy on samples from real world.

The proposed research will involve three stages:

  1. Finalizing optimizations to the hardware of the laboratory setup

  2. do some basic testing to decide which of the following GAN approaches will be taken

    1. Transparent defect image

In this approach, scratches will be generated through GANs. The synthetic scratches from the GANs will be overlayed to different real scratch-free surface images to enlarge the training dataset for the defect detection CNN.

    1. Patch based/Inpainting approach

In this approach, a GAN which can generate patches of surfaces with scratches will be trained. The patches generated need to be able to be seamlessly integrated into a real surface. The approach is similar to the idea of Inpainting, with the masked area being filled by synthetic scratches instead of visually realistic pixels.

    1. Image-to-image translation

In this approach, a GAN will either turn a defect map(2D array indicating the desired locations of scratches) into a synthetic surface with scratches, or a real surface with scratches into a defect map. The second approach shows some resemblances to the task of semantic segmentation in the paper .

  1. Training a CNN with a dataset generated from GAN and evaluate performance of the trained CNN with data unknown to the CNN, acquired from a variety of aluminum housing geometries and the laboratory setup with different settings.

[1]

Goodfellow, I., Pouget-Abadie, J., Mirza, M. et al., „Generative Adversarial Networks,“ Communications of the ACM, Bd. 63, Nr. 11, pp. 139-144, 2020.

[2]

Pathak et al., „Context encoders: Feature learning by Inpainting,“ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[3]

P. Isola, J.-Y. Zhu, T. Zhou und A. Efros, „Image-to-image translation with conditional adversarial networks,“ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Optimizing Remote Scanning Workflow with Automated Data Extraction from Streamed Content

Remote scanning allows medical technologists to operate scanning systems from a remote location. Scanners are connected via KVM-switches (Keyboard, Video, Mouse), as there is no other interface standard existing. However, KVM only streams video and control signals, without transmitting structured data like scan progress or warnings from the scanner. This limitation creates a significant challenge for remote operators, may miss important system events or updates, leading to potential miscommunication with onsite staff. This can result in delays in patient care and depend on the onsite team to monitor patient conditions effectively during scans.

Final Goals:
Prototype Methods to capture data from scanning console video streams in real time such as-
a) Error or Warning popup messages in different languages, from different interfaces
b) Calculate process progression from the progress bar.