Index
Lightweight Early Forest Fire Detection from Unmanned Aerial Vehicles based on Spatial-Temporal Correlation
Calving Fronts and How to Segment Them Using Diffusion Networks
Global warming is impacting every part of our planet, and is also responsible for the rise of sea levels
around the world, posing a threat to a majority of the world’s population living in coastal areas. While
there are multiple factors contributing to sea level rise (SLR), such as thermal expansion due to warmer
oceans, it is also in greater part caused by the melting of glaciers and ice regions which stream into
the ocean [1]. It is therefore important for us to understand and monitor glacier ice loss, specifically
for marine- or lake-terminating glaciers. We can do so by looking at calving front movement, where
calving fronts represent the border between an ocean and a glacier. Delineating this exact front position
is fundamental for analysing the health of our glaciers and how global warming is impacting them.
Manually delineating calving fronts is incredibly time intensive, which is why in recent years, researchers
have started automating this process by turning towards deep learning algorithms. Gourmelon et
al. [2] used a U-Net for segmenting SAR images into different regions and then extracted the calving
front in a post-processing step. Wu et al. [3] combined two U-Nets to develop a cross-resolution
segmentation method, which improves the network’s ability to classify the calving front by having
coarse and fine-grained feature maps interact with each other through an attention-based hooking
mechanism.
Diffusion models have made headlines over the past year for their ability to produce fantastically
realistic images [4]. Since the inception of diffusion models, researchers have also started using them
for image segmentation, like in SegDiff [5], which has been further explored in the medical field
with EnsemDiff [6], as well as MedSegDiff and MedSegDiff-V2 [7, 8]. In the field of calving front
delineation however, using diffusion models has not yet been tested, which is what the focus of this
thesis will be.
In detail, the thesis consists of the following parts:
• a literature review of diffusion models being used for image segmentation tasks,
• a review of diffusion models to segment SAR calving front images into different zones,
• using a diffusion model to directly segment calving front positions,
• comparing the created diffusion model against other methods that were evaluated on the CaFFe
dataset [9].
References
[1] Hans-Otto P¨ortner, Debra C Roberts, Val´erie Masson-Delmotte, Panmao Zhai, Melinda Tignor, Elvira
Poloczanska, and NM Weyer. The ocean and cryosphere in a changing climate. IPCC special report on the
ocean and cryosphere in a changing climate, 1155, 2019.
[2] Nora Gourmelon, Thorsten Seehaus, Matthias Braun, Andreas Maier, and Vincent Christlein. Calving
fronts and where to find them: a benchmark dataset and methodology for automatic glacier calving front
extraction from synthetic aperture radar imagery. Earth System Science Data, 14(9):4287–4313, 2022.
[3] Fei Wu, Nora Gourmelon, Thorsten Seehaus, Jianlin Zhang, Matthias Braun, Andreas Maier, and Vincent
Christlein. Amd-hooknet for glacier front segmentation. IEEE Transactions on Geoscience and Remote
Sensing, 61:1–12, 2023.
[4] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural
information processing systems, 33:6840–6851, 2020.
[5] Tomer Amit, Tal Shaharbany, Eliya Nachmani, and Lior Wolf. Segdiff: Image segmentation with diffusion
probabilistic models. arXiv preprint arXiv:2112.00390, 2021.
[6] Julia Wolleb, Robin Sandk¨uhler, Florentin Bieder, Philippe Valmaggia, and Philippe C Cattin. Diffusion
models for implicit image segmentation ensembles. In International Conference on Medical Imaging with
Deep Learning, pages 1336–1348. PMLR, 2022.
[7] Junde Wu, Huihui Fang, Yu Zhang, Yehui Yang, and Yanwu Xu. Medsegdiff: Medical image segmentation
with diffusion probabilistic model. arXiv preprint arXiv:2211.00611, 2022.
[8] Junde Wu, Rao Fu, Huihui Fang, Yu Zhang, and Yanwu Xu. Medsegdiff-v2: Diffusion based medical
image segmentation with transformer. arXiv preprint arXiv:2301.11798, 2023.
[9] Nora Gourmelon, Thorsten Seehaus, Julian Klink, Matthias Braun, Andreas Maier, and Vincent Christlein.
Caffe-a benchmark dataset for glacier calving front extraction from synthetic aperture radar imagery. In
IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pages 896–898. IEEE,
2023.
[10] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming
Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In NIPS
Autodiff Workshop, 2017.
Multimodal Gesture Classification in Artwork Images
This thesis addresses the challenge of gesture classification in artwork images specifically on the SniffyArt dataset[1]. Traditional classification methods fall short due to the change in domain, limited dataset size, class imbalance, and the difficulty of discriminating between different smell gestures. The thesis tackles this challenge by exploring multimodal learning techniques, specifically leveraging bounding box and keypoint information and their fusion to provide a richer contextual understanding of the classification network.
Objectives:
Literature Review: Conduct an in-depth review of existing multimodal learning techniques, with a focus on methodologies utilizing both bounding box and keypoint information such as ED-pose[2], UniPose[3], PRTR [4] among many others
Model Design: Add a specialized classifier which takes the whole image context, person box and keypoint features obtained from one of the methods from the literature ED-pose and performs gesture classification.
Model Evaluation: Evaluate the performance of the proposed model against all modalities i.e. person detection, pose estimation and gesture classification, and their combination.
Baseline Results: Create baseline results for box detection, pose estimation and gesture classification using: 1) separate standard models for each of these modalities, and 2) train the selected method from the literature review directly for gesture boxes i.e. without a specialized classifier.
Aside from separate evaluation of the subtasks, evaluate the full pipeline, i.e. classification performance of the whole image when both bounding box and keypoint information are unavailable.
Optional Tasks: Incorporating text prompts as an additional modality information as in UniPose.
[1] Zinnen, M., Hussian, A., Tran, H., Madhu, P., Maier, A., & Christlein, V. (2023, November). SniffyArt: The Dataset of Smelling Persons. In Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents (pp. 49-58).
[2] Yang, J., Zeng, A., Liu, S., Li, F., Zhang, R., & Zhang, L. (2023). Explicit box detection unifies end-to-end multi-person pose estimation. arXiv preprint arXiv:2302.01593..
[3] Yang, J., Zeng, A., Zhang, R., & Zhang, L. (2023). Unipose: Detecting any keypoints. arXiv preprint arXiv:2310.08530.
[4] Li, K., Wang, S., Zhang, X., Xu, Y., Xu, W., & Tu, Z. (2021). Pose recognition with cascade transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1944-1953).
CFD Simulation for Blood Flow in Embolization Procedures
A disentangled representation strategy to enhance multi-organ segmentation in CT using multiple datasets
Medical image segmentation is important for identifying human organs, essential in clinical diagnosis and treatment planning.However, the accuracy of segmentation results is often compromised due to the limited quality and completeness of medical imaging data. In practical applications, deep learning has become a key method for multiorgan segmentation[1, 3], but it struggles with challenges related to the amount and quality of data.Deep learning segmentation models typically require numerous paired images and annotations for training[2]. However, fully annotated multi-organ CT datasets are rare, while those annotating only a few organs are more frequent. The variation in annotations restricts the efficient utilization of numerous public segmentation datasets. Inspired by disentangled learning’s ability to share knowledge across tasks[4, 5, 6], we’ve developed a method that allows models to learn and incorporate features from different datasets. We attempt to combine two types of datasets: one fully annotated for multiple organs but with a small amount of data, and another larger dataset annotated only for certain organs.This method is designed to improve the model’s capability in segmenting multiple organs.Using disentangled learning, the model is able to extract and combine crucial features from various datasets, thus overcoming the challenge of inconsistent annotations. This method aims to enhance the model’s adaptability and precision. We assess its performance by comparing the model’s predicted segmentations with actual annotations, allowing for a detailed evaluation of using the disentangled learning approach versus models trained with only a single dataset in multi-organ segmentation tasks. To summarize, the thesis will cover the following aspects:
- Design a multi-organ segmentation model using disentangled learning methods.
- Investigate the influence of the quantity of fused datasets on the multiorgan segmentation model.
- Investigate the influence of the proportion of data quantity from different datasets on the multi-organ segmentation model.
- Investigate the influence of feature weights from different datasets on the multi-organ segmentation model.
References
[1] Yabo Fu, Yang Lei, TongheWang, Walter J. Curran, Tian Liu, and Xiaofeng Yang. A review of deep learning based methods for medical image multiorgan segmentation. Physica Medica, 85:107–122, 2021.
[2] Tianxing He, Shengcheng Yu, Ziyuan Wang, Jieqiong Li, and Zhenyu Chen. From data quality to model quality: an exploratory study on deep learning, 2019.
[3] Yang Lei, Yabo Fu, Tonghe Wang, Richard L. J. Qiu, Walter J. Curran, Tian Liu, and Xiaofeng Yang. Deep learning in multi-organ segmentation, 2020.
[4] Yuanyuan Lyu, Haofu Liao, Heqin Zhu, and S. Kevin Zhou. A3dsegnet: Anatomy-aware artifact disentanglement and segmentation network for unpaired segmentation, artifact reduction, and modality translation, 2021.
[5] Qiushi Yang, Xiaoqing Guo, Zhen Chen, Peter Y. M. Woo, and Yixuan Yuan. D2-net: Dual disentanglement network for brain tumor segmentation with missing modalities. IEEE Transactions on Medical Imaging, 41(10):2953–2964, 2022.
[6] Tongxue Zhou, Su Ruan, and St´ephane Canu. A review: Deep learning for medical image segmentation using multi-modality fusion. Array, 3-4:100004, 2019.