Index

Enhancing Inference Efficiency of Deep Learning Models for Camera-Based Road Segmentation

ISLES Challenge 2024: Infarct segmentation from CT images

Final, post-treatment infarct segmentation from pre-treatment acute imaging (CT) and clinical data.

Idea: Investigate data from this year’s ISLES 2024 challenge and build + train a model for stroke lesion segmentation. Potentially submit the model to the challenge.

Reference: https://isles-24.grand-challenge.org/

Multimodal Gesture Classification in Artwork Images

This thesis addresses the challenge of gesture classification in artwork images specifically on the SniffyArt dataset[1]. Traditional classification methods fall short due to the change in domain, limited dataset size, class imbalance, and the difficulty of discriminating between different smell gestures. The thesis tackles this challenge by exploring multimodal learning techniques, specifically leveraging bounding box and keypoint information and their fusion to provide a richer contextual understanding of the classification network.

Objectives:

Literature Review: Conduct an in-depth review of existing multimodal learning techniques, with a focus on methodologies utilizing both bounding box and keypoint information such as ED-pose[2], UniPose[3], PRTR [4] among many others

Model Design: Add a specialized classifier which takes the whole image context, person box and keypoint features obtained from one of the methods from the literature ED-pose and performs gesture classification.

Model Evaluation: Evaluate the performance of the proposed model against all modalities i.e. person detection, pose estimation and gesture classification, and their combination.

Baseline Results: Create baseline results for box detection, pose estimation and gesture classification using: 1) separate standard models for each of these modalities, and 2) train the selected method from the literature review directly for gesture boxes i.e. without a specialized classifier.

Aside from separate evaluation of the subtasks, evaluate the full pipeline, i.e. classification performance of the whole image when both bounding box and keypoint information are unavailable.

Optional Tasks: Incorporating text prompts as an additional modality information as in UniPose.

[1] Zinnen, M., Hussian, A., Tran, H., Madhu, P., Maier, A., & Christlein, V. (2023, November). SniffyArt: The Dataset of Smelling Persons. In Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents (pp. 49-58).

[2] Yang, J., Zeng, A., Liu, S., Li, F., Zhang, R., & Zhang, L. (2023). Explicit box detection unifies end-to-end multi-person pose estimation. arXiv preprint arXiv:2302.01593..

[3] Yang, J., Zeng, A., Zhang, R., & Zhang, L. (2023). Unipose: Detecting any keypoints. arXiv preprint arXiv:2310.08530.

[4] Li, K., Wang, S., Zhang, X., Xu, Y., Xu, W., & Tu, Z. (2021). Pose recognition with cascade transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1944-1953).

CFD Simulation for Blood Flow in Embolization Procedures

A disentangled representation strategy to enhance multi-organ segmentation in CT using multiple datasets

Medical image segmentation is important for identifying human organs, essential in clinical diagnosis and treatment planning.However, the accuracy of segmentation results is often compromised due to the limited quality and completeness of medical imaging data. In practical applications, deep learning has become a key method for multiorgan segmentation[1, 3], but it struggles with challenges related to the amount and quality of data.Deep learning segmentation models typically require numerous paired images and annotations for training[2]. However, fully annotated multi-organ CT datasets are rare, while those annotating only a few organs are more frequent. The variation in annotations restricts the efficient utilization of numerous public segmentation datasets. Inspired by disentangled learning’s ability to share knowledge across tasks[4, 5, 6], we’ve developed a method that allows models to learn and incorporate features from different datasets. We attempt to combine two types of datasets: one fully annotated for multiple organs but with a small amount of data, and another larger dataset annotated only for certain organs.This method is designed to improve the model’s capability in segmenting multiple organs.Using disentangled learning, the model is able to extract and combine crucial features from various datasets, thus overcoming the challenge of inconsistent annotations. This method aims to enhance the model’s adaptability and precision. We assess its performance by comparing the model’s predicted segmentations with actual annotations, allowing for a detailed evaluation of using the disentangled learning approach versus models trained with only a single dataset in multi-organ segmentation tasks. To summarize, the thesis will cover the following aspects:

Design a multi-organ segmentation model using disentangled learning methods.
Investigate the influence of the quantity of fused datasets on the multiorgan segmentation model.
Investigate the influence of the proportion of data quantity from different datasets on the multi-organ segmentation model.
Investigate the influence of feature weights from different datasets on the multi-organ segmentation model.

References
[1] Yabo Fu, Yang Lei, TongheWang, Walter J. Curran, Tian Liu, and Xiaofeng Yang. A review of deep learning based methods for medical image multiorgan segmentation. Physica Medica, 85:107–122, 2021.
[2] Tianxing He, Shengcheng Yu, Ziyuan Wang, Jieqiong Li, and Zhenyu Chen. From data quality to model quality: an exploratory study on deep learning, 2019.
[3] Yang Lei, Yabo Fu, Tonghe Wang, Richard L. J. Qiu, Walter J. Curran, Tian Liu, and Xiaofeng Yang. Deep learning in multi-organ segmentation, 2020.
[4] Yuanyuan Lyu, Haofu Liao, Heqin Zhu, and S. Kevin Zhou. A3dsegnet: Anatomy-aware artifact disentanglement and segmentation network for unpaired segmentation, artifact reduction, and modality translation, 2021.
[5] Qiushi Yang, Xiaoqing Guo, Zhen Chen, Peter Y. M. Woo, and Yixuan Yuan. D2-net: Dual disentanglement network for brain tumor segmentation with missing modalities. IEEE Transactions on Medical Imaging, 41(10):2953–2964, 2022.
[6] Tongxue Zhou, Su Ruan, and St´ephane Canu. A review: Deep learning for medical image segmentation using multi-modality fusion. Array, 3-4:100004, 2019.

Defect Detection Probability as a Metric for CT Image Quality Assessment

This project focuses on using defect detection probability within CT (Computed Tomography) images as a metric for assessing image quality. Key steps include:

Establishing a data preparation pipeline to insert defects into CT volumes sourced from CAD files.
Simulating CT scans to replicate imaging processes.
Developing a defect detection neural network to analyze CT images and determine the probability of defect presence.
Utilizing the defect detection probability as a quantitative metric for evaluating the quality of CT images, with potential integration of trajectory optimization techniques.

Automated ONNX2TikZ: Generating LaTeX-TikZ Diagrams of Neural Networks

This project aims to automate the conversion of ONNX models into TikZ code, facilitating the creation of visually appealing diagrams in LaTeX documents. Leveraging Python for ONNX parsing and manipulation, alongside LaTeX and TikZ for rendering, this tool streamlines the process of visualizing neural network architectures for academic papers, presentations, and educational materials