Index

Multimodal Gesture Classification in Artwork Images

This thesis addresses the challenge of gesture classification in artwork images specifically on the SniffyArt dataset[1]. Traditional classification methods fall short due to the change in domain, limited dataset size, class imbalance, and the difficulty of discriminating between different smell gestures. The thesis tackles this challenge by exploring multimodal learning techniques, specifically leveraging bounding box and keypoint information and their fusion to provide a richer contextual understanding of the classification network.

 

Objectives:

Literature Review: Conduct an in-depth review of existing multimodal learning techniques, with a focus on methodologies utilizing both bounding box and keypoint information such as ED-pose[2], UniPose[3], PRTR [4] among many others

Model Design: Add a specialized classifier which takes the whole image context, person box and keypoint features obtained from one of the methods from the literature  ED-pose and performs gesture classification.

Model Evaluation: Evaluate the performance of the proposed model against all modalities i.e. person detection, pose estimation and gesture classification, and their combination.

Baseline Results: Create baseline results for box detection, pose estimation and gesture classification using: 1) separate standard models for each of these modalities, and 2) train the selected method from the literature review directly for gesture boxes i.e. without a specialized classifier.

Aside from separate evaluation of the subtasks, evaluate the full pipeline, i.e. classification performance of the whole image when both bounding box and keypoint information are unavailable.

Optional Tasks: Incorporating text prompts as an additional modality information as in UniPose.

 

[1] Zinnen, M., Hussian, A., Tran, H., Madhu, P., Maier, A., & Christlein, V. (2023, November). SniffyArt: The Dataset of Smelling Persons. In Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents (pp. 49-58).

[2] Yang, J., Zeng, A., Liu, S., Li, F., Zhang, R., & Zhang, L. (2023). Explicit box detection unifies end-to-end multi-person pose estimation. arXiv preprint arXiv:2302.01593..

[3] Yang, J., Zeng, A., Zhang, R., & Zhang, L. (2023). Unipose: Detecting any keypoints. arXiv preprint arXiv:2310.08530.

[4] Li, K., Wang, S., Zhang, X., Xu, Y., Xu, W., & Tu, Z. (2021). Pose recognition with cascade transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1944-1953).

CFD Simulation for Blood Flow in Embolization Procedures

A disentangled representation strategy to enhance multi-organ segmentation in CT using multiple datasets

Medical image segmentation is important for identifying human organs, essential in clinical diagnosis and treatment planning.However, the accuracy of segmentation results is often compromised due to the limited quality and completeness of medical imaging data. In practical applications, deep learning has become a key method for multiorgan segmentation[1, 3], but it struggles with challenges related to the amount and quality of data.Deep learning segmentation models typically require numerous paired images and annotations for training[2]. However, fully annotated multi-organ CT datasets are rare, while those annotating only a few organs are more frequent. The variation in annotations restricts the efficient utilization of numerous public segmentation datasets. Inspired by disentangled learning’s ability to share knowledge across tasks[4, 5, 6], we’ve developed a method that allows models to learn and incorporate features from different datasets. We attempt to combine two types of datasets: one fully annotated for multiple organs but with a small amount of data, and another larger dataset annotated only for certain organs.This method is designed to improve the model’s capability in segmenting multiple organs.Using disentangled learning, the model is able to extract and combine crucial features from various datasets, thus overcoming the challenge of inconsistent annotations. This method aims to enhance the model’s adaptability and precision. We assess its performance by comparing the model’s predicted segmentations with actual annotations, allowing for a detailed evaluation of using the disentangled learning approach versus models trained with only a single dataset in multi-organ segmentation tasks. To summarize, the thesis will cover the following aspects:

  • Design a multi-organ segmentation model using disentangled learning methods.
  • Investigate the influence of the quantity of fused datasets on the multiorgan segmentation model.
  • Investigate the influence of the proportion of data quantity from different datasets on the multi-organ segmentation model.
  • Investigate the influence of feature weights from different datasets on the multi-organ segmentation model.

References
[1] Yabo Fu, Yang Lei, TongheWang, Walter J. Curran, Tian Liu, and Xiaofeng Yang. A review of deep learning based methods for medical image multiorgan segmentation. Physica Medica, 85:107–122, 2021.
[2] Tianxing He, Shengcheng Yu, Ziyuan Wang, Jieqiong Li, and Zhenyu Chen. From data quality to model quality: an exploratory study on deep learning, 2019.
[3] Yang Lei, Yabo Fu, Tonghe Wang, Richard L. J. Qiu, Walter J. Curran, Tian Liu, and Xiaofeng Yang. Deep learning in multi-organ segmentation, 2020.
[4] Yuanyuan Lyu, Haofu Liao, Heqin Zhu, and S. Kevin Zhou. A3dsegnet: Anatomy-aware artifact disentanglement and segmentation network for unpaired segmentation, artifact reduction, and modality translation, 2021.
[5] Qiushi Yang, Xiaoqing Guo, Zhen Chen, Peter Y. M. Woo, and Yixuan Yuan. D2-net: Dual disentanglement network for brain tumor segmentation with missing modalities. IEEE Transactions on Medical Imaging, 41(10):2953–2964, 2022.
[6] Tongxue Zhou, Su Ruan, and St´ephane Canu. A review: Deep learning for medical image segmentation using multi-modality fusion. Array, 3-4:100004, 2019.

A Bias Analysis on Audio and Linguistic Embeddings for the Classification of Alzheimer’s Disease

Deep Learning for Glioma Survival Prediction

Estimation of 3D Implant Pose and Position from 2D X-Ray Images using Transformer Networks

Deep Learning for Bias Field Correction in MRI Scans

Spoken Language Identification for Hearing Aids

Deep Learning-Based Breast Density Categorization in Asian Women

thesisdescription

Improvements in SSL image-text learnings on CXR images