Unsupervised Learning for Glacier Front Delineation

The use of synthetic aperture radar (SAR) imagery allows for all year round monitoring of glacier
movements, regardless of weather influences. This results in huge amounts of data, making human
evaluation of every image infeasible. Advances in the field of Deep Learning create new ways for
automatic image segmentation using a variety of models. Using the CaFFe benchmark dataset [3]
allows for a proper comparison of different model architectures. The problem of unlabeled data
still persist, with the CaFFe training dataset only consisting of only five different glaciers with
some of them being underrepresented.
In order to take advantage of unlabeled data, this thesis will apply semi-supervised learning techniques.
Semi-supervised learning is, as the name suggests, a hybrid learning scheme of supervised
and unsupervised models, using both labeled and unlabeled SAR images. After training the model,
the probability for each pixel in the image to belong to one of four classes (glacier, ocean, rock
outcrop and areas with no information available) is calculated, resulting in a zone prediction. The
front of the glacier can then be calculated afterwards during the post-processing. Two of the most
famous self-supervised learning schemes are iBot [6] and DinoV2 [5], which will both be analyzed
and evaluated. Both frameworks will have to be modified according to the guidelines laid out by
Gourmelon et al. [2], in order to properly compare them with each other and with other models
trained on the CaFFe dataset.
The backbone used in both cases is the HookFormer, a model based on the Swin Transformer [4]
that has shown better performance when used for glacier front delineation [1]. It employs two
Transformer models with a cross-resolution interaction between them, using images of different
resolution from the same area.

References
[1] Nora Gourmelon et al. Deep learning has yet to match human performance in delineating
glacier calving fronts. In Preparation.
[2] Nora Gourmelon, Thomas Seehaus, Matthias Braun, Andreas Maier, and Vincent Christlein.
Calving fronts and where to find them: a benchmark dataset and methodology for automatic
glacier calving front extraction from synthetic aperture radar imagery. Earth System Science
Data, pages 4287–4313, 2022.
[3] Nora Gourmelon, Thorsten Seehaus, Julian Klink, Matthias Braun, Andreas Maier, and Vincent
Christlein. Caffe – a benchmark dataset for glacier calving front extraction from synthetic
aperture radar imagery. In IGARSS 2023 – 2023 IEEE International Geoscience and Remote
Sensing Symposium, pages 896–898, 2023.
[4] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining
Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021
IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002. IEEE
Computer Society, 2021.
[5] Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov,
Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran,
Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra,
Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick
Labatut, Armand Joulin, and Piotr Bojanowski. Dinov2: Learning robust visual features
without supervision. arXiv preprint arXiv:2304.07193v2, 2024.
[6] Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong.
ibot: Image bert pre-training with online tokenizer. International Conference on Learning
Representations (ICLR), 2022.