Index
Multimodal Gesture Classification in Artwork Images
This thesis addresses the challenge of gesture classification in artwork images specifically on the SniffyArt dataset[1]. Traditional classification methods fall short due to the change in domain, limited dataset size, class imbalance, and the difficulty of discriminating between different smell gestures. The thesis tackles this challenge by exploring multimodal learning techniques, specifically leveraging bounding box and keypoint information and their fusion to provide a richer contextual understanding of the classification network.
Objectives:
Literature Review: Conduct an in-depth review of existing multimodal learning techniques, with a focus on methodologies utilizing both bounding box and keypoint information such as ED-pose[2], UniPose[3], PRTR [4] among many others
Model Design: Add a specialized classifier which takes the whole image context, person box and keypoint features obtained from one of the methods from the literature ED-pose and performs gesture classification.
Model Evaluation: Evaluate the performance of the proposed model against all modalities i.e. person detection, pose estimation and gesture classification, and their combination.
Baseline Results: Create baseline results for box detection, pose estimation and gesture classification using: 1) separate standard models for each of these modalities, and 2) train the selected method from the literature review directly for gesture boxes i.e. without a specialized classifier.
Aside from separate evaluation of the subtasks, evaluate the full pipeline, i.e. classification performance of the whole image when both bounding box and keypoint information are unavailable.
Optional Tasks: Incorporating text prompts as an additional modality information as in UniPose.
[1] Zinnen, M., Hussian, A., Tran, H., Madhu, P., Maier, A., & Christlein, V. (2023, November). SniffyArt: The Dataset of Smelling Persons. In Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents (pp. 49-58).
[2] Yang, J., Zeng, A., Liu, S., Li, F., Zhang, R., & Zhang, L. (2023). Explicit box detection unifies end-to-end multi-person pose estimation. arXiv preprint arXiv:2302.01593..
[3] Yang, J., Zeng, A., Zhang, R., & Zhang, L. (2023). Unipose: Detecting any keypoints. arXiv preprint arXiv:2310.08530.
[4] Li, K., Wang, S., Zhang, X., Xu, Y., Xu, W., & Tu, Z. (2021). Pose recognition with cascade transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1944-1953).
CFD Simulation for Blood Flow in Embolization Procedures
A disentangled representation strategy to enhance multi-organ segmentation in CT using multiple datasets
Medical image segmentation is important for identifying human organs, essential in clinical diagnosis and treatment planning.However, the accuracy of segmentation results is often compromised due to the limited quality and completeness of medical imaging data. In practical applications, deep learning has become a key method for multiorgan segmentation[1, 3], but it struggles with challenges related to the amount and quality of data.Deep learning segmentation models typically require numerous paired images and annotations for training[2]. However, fully annotated multi-organ CT datasets are rare, while those annotating only a few organs are more frequent. The variation in annotations restricts the efficient utilization of numerous public segmentation datasets. Inspired by disentangled learning’s ability to share knowledge across tasks[4, 5, 6], we’ve developed a method that allows models to learn and incorporate features from different datasets. We attempt to combine two types of datasets: one fully annotated for multiple organs but with a small amount of data, and another larger dataset annotated only for certain organs.This method is designed to improve the model’s capability in segmenting multiple organs.Using disentangled learning, the model is able to extract and combine crucial features from various datasets, thus overcoming the challenge of inconsistent annotations. This method aims to enhance the model’s adaptability and precision. We assess its performance by comparing the model’s predicted segmentations with actual annotations, allowing for a detailed evaluation of using the disentangled learning approach versus models trained with only a single dataset in multi-organ segmentation tasks. To summarize, the thesis will cover the following aspects:
- Design a multi-organ segmentation model using disentangled learning methods.
- Investigate the influence of the quantity of fused datasets on the multiorgan segmentation model.
- Investigate the influence of the proportion of data quantity from different datasets on the multi-organ segmentation model.
- Investigate the influence of feature weights from different datasets on the multi-organ segmentation model.
References
[1] Yabo Fu, Yang Lei, TongheWang, Walter J. Curran, Tian Liu, and Xiaofeng Yang. A review of deep learning based methods for medical image multiorgan segmentation. Physica Medica, 85:107–122, 2021.
[2] Tianxing He, Shengcheng Yu, Ziyuan Wang, Jieqiong Li, and Zhenyu Chen. From data quality to model quality: an exploratory study on deep learning, 2019.
[3] Yang Lei, Yabo Fu, Tonghe Wang, Richard L. J. Qiu, Walter J. Curran, Tian Liu, and Xiaofeng Yang. Deep learning in multi-organ segmentation, 2020.
[4] Yuanyuan Lyu, Haofu Liao, Heqin Zhu, and S. Kevin Zhou. A3dsegnet: Anatomy-aware artifact disentanglement and segmentation network for unpaired segmentation, artifact reduction, and modality translation, 2021.
[5] Qiushi Yang, Xiaoqing Guo, Zhen Chen, Peter Y. M. Woo, and Yixuan Yuan. D2-net: Dual disentanglement network for brain tumor segmentation with missing modalities. IEEE Transactions on Medical Imaging, 41(10):2953–2964, 2022.
[6] Tongxue Zhou, Su Ruan, and St´ephane Canu. A review: Deep learning for medical image segmentation using multi-modality fusion. Array, 3-4:100004, 2019.
Defect Detection Probability as a Metric for CT Image Quality Assessment
This project focuses on using defect detection probability within CT (Computed Tomography) images as a metric for assessing image quality. Key steps include:
- Establishing a data preparation pipeline to insert defects into CT volumes sourced from CAD files.
- Simulating CT scans to replicate imaging processes.
- Developing a defect detection neural network to analyze CT images and determine the probability of defect presence.
- Utilizing the defect detection probability as a quantitative metric for evaluating the quality of CT images, with potential integration of trajectory optimization techniques.
Automated ONNX2TikZ: Generating LaTeX-TikZ Diagrams of Neural Networks
This project aims to automate the conversion of ONNX models into TikZ code, facilitating the creation of visually appealing diagrams in LaTeX documents. Leveraging Python for ONNX parsing and manipulation, alongside LaTeX and TikZ for rendering, this tool streamlines the process of visualizing neural network architectures for academic papers, presentations, and educational materials
Feature Extraction and Dimensionality Reduction Techniques for Assessing Similarity in Large-Scale 3D CAD Datasets
Work description
The research presented in this thesis explores the application of feature extraction and dimensionality reduction techniques to assess model similarity within large-scale 3D CAD datasets. It investigates how different geometric and topological descriptors can be quantified and utilized to measure the similarity between complex 3D models. Therefore, the study employs advanced machine learning algorithms to analyze and cluster 3D data, facilitating a better understanding of model characteristics and relationships.
During the thesis, the following questions should be considered:
- What metrics can effectively quantify the variance in a training dataset?
- How does the variance within a training set impact the neural network’s ability to generalize to new, unseen data?
- What is the optimal balance of diversity and specificity in a training dataset to maximize NN performance?
- How can training datasets be curated to include a beneficial level of variance without compromising the quality of the neural network’s output?
- What methodologies can be implemented to systematically adjust the variance in training data and evaluate its impact on NN generalization?
Prerequisites
Applicants should have a solid background in machine learning and deep learning, with strong technical skills in Python and experience with PyTorch. Candidates should also possess the capability to work independently and have a keen interest in exploring the theoretical aspects of neural network training.
For your application, please send your transcript of record.
Review of Zero-shot, Few-shot classification, detection and segmentation methods in Medical Imaging
Review of Zero-shot, Few shot classification, detection and segmentation methods in medical imaging.
Evaluation of MedKLIP for Zero-shot and Fine-tuned classification of CXRs
Zero-shot scores on NIH and RSNA Pneumonia datasets. Analysis of attention maps and point score on VinDR-CXR dataset. Analysis of performance improvement from zero-shot to fine-tuned classification performance for various findings.