Index
Multimodal fusion of pose and visual information for gesture recognition in historical artworks
Multimodal fusion of pose and visual information for gesture recognition in historical artworks
Gestures in historical artwork can communicate the underlying human experiences, offering a broad outlook on the past sensory worlds. To explore this domain, we use the SensoryArt [1] – a dataset of multisensory gestures in historical artworks that comes with person pose estimation key points and gesture labels. The goal of the thesis is to perform gesture classification of the persons’ actions depicted on the paintings. We aim to investigate how additional information on the body posture, such as annotated skeleton information, can affect the performance of the models.
Mandatory Goals:
- Train a model for a multi-label gesture classification on the cropped images with fused ground truth heatmaps of the SensoryArt dataset + evaluate on validation split.
- Selection and training of a well-performing keypoint estimation model.
- Evaluate the performance of the end-to-end pipeline on the cropped images consisting of predicting the heatmaps first and then classifying.
- Train another model for a multiperson gesture classification problem on the image level with fused ground truth heatmaps of the uncropped images + evaluate on validation split.
- Perform an inference test of the model on original images with machine-generated heatmaps.
Optional Goals:
- Test incorporating additional information on body position, not as heatmaps but as skeleton key point coordinates/angles.
- Conduct additional ablations such as cropping the humans out of the images in square size.
- Integrate a multi-label approach into the detection pipeline.
- Test human pose estimation on artwork using the additionally provided gesture labels.
[1] Zinnen, M., Christlein, V., Maier, A., & Hussian, A. (2024). SensoryArt (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10889613
A Comparative Study of Deep Learning Models for Brain Metastases Autosegmentation
Improving manual annotation of 3D medical segmentation dataset using SAM2
In many medical scenarios, physicians need to annotate pixelwise objects in CT images, whole slide images (WSI), or cellular images. This annotation process often requires a significant amount of time and effort, especially when dealing with large datasets. To address this challenge, a web-based tool capable of automatically segmenting 3D and 2D medical images are widely expected.
EXACT is an existing web-based annotation platform and has already certain user base. Exact supports interdisciplinary collaboration and allows for both online and offline annotation and analysis of images across various domains. Physicians can annotate images directly through the platform’s web interface, which is intuitive and efficient. [1]
To enhance the functionality of Exact, an automatic segmentation plugin is explored and implemented in this thesis and integrate it with Exact. This plugin will enable physicians and researchers to automatically generate high-quality segmentation masks while annotating and save these masks for future use. This approach can significantly improve the efficiency of medical image annotation, reduce manual effort, and optimize medical imaging workflows.
A critical aspect of this project is selecting a segmentation model that is both efficient and accurate. I plan to adopt Segment Anything Model 2 (SAM2), as it has demonstrated robust performance in handling diverse medical imaging tasks (including CT, WSI, and cellular images) while ensuring segmentation precision and reliability. [2]
[1] Christian Marzahl, Marc Aubreville, Christof A. Bertram, Jennifer Maier, Christian Bergler, Christine Kröger, Jörn Voigt, Katharina Breininger, Robert Klopfleisch, and Andreas Maier. Exact: a collaboration toolset for algorithm-aided annotation of images with annotation version control. Scientific Reports, 11(1):4343, Feb 2021.
[2] Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feichtenhofer. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024.
Searching for evidence of world models in reinforcement learning agents
Advanced Machine Learning-Based High Demand Forecasting of Household Energy Consumption for Enhancing Grid Operations
This thesis explores forecasting techniques for household energy consumption to help Distribution System Operators (DSOs) manage high demand loads and ensure grid stability. It focuses on predicting when demand exceeds critical thresholds and for how long, enabling proactive energy management. The study analyzes how different data aggregation levels affect forecast accuracy and investigates methods to restore altered load signals for better predictions. By comparing forecasting models and evaluating their performance, the research aims to improve energy management, support automation in grid operations, and enhance data-driven decision-making for a more stable and efficient power distribution system.
Longitudinal Analysis of Parkinson’s Disease Patients Using Natural Language Processing Methods
Removing age bias in the context of pathological speech
Anomaly Detection of Industrial Products using Large Vision Language Models
Deep Learning–Driven Lorentzian Fitting for 31P Spectrum
To isolate different peaks in the phosphorus spectrum, several preprocessing steps are usually performed, and the final information about the different metabolites is extracted by multiple Lorentzian line fitting of the spectrum [1]; this least-squares fitting is prone to noise and also depends on preprocessing steps [2]. This study will investigate the use of the Lorentzian distribution generator as a known operator in a fully connected network to fit the phosphorus spectrum.
The thesis will include the following points:
- Number of Lorentzian distributions required to fit the phosphorus spectrum
- Comparison of the least square fit and the deep Lorentzain fit
- Correlation of deep lorentzain fit peaks with tumor types
References:
- Meyerspeer M, Boesch C, Cameron D, et al. 31 P magnetic resonance spectroscopy in skeletal muscle: Experts’ consensus recommendations. NMR Biomed. Published online February 10, 2020. doi:10.1002/nbm.4246
- Rajput, J.R., et al.: Physics-informed conditional autoencoder approach for robust metabolic CEST MRI at 7T. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. LNCS, vol. 14227. Springer, Cham (2023)
Pathology detection in medical images
This work will investigate applying computer vision detection techniques in medical images
Required: strong skills in
- proogramming python
- deep learning , training methods, pattern recognition, loss functions
- medical imaging background – X-rays, CT scan images, DICOM processing
- communication, scientific writing