Index
Evaluation of a Modified U-Net with Dropout and a Multi-Task Model for Glacier Calving Front Segmentation
Evaluation of a Modified U-Net with Dropout and a Multi-Task Model for Glacier Calving Front Segmentation
With global temperatures rising, the tracking and prediction of glacier changes become more and more relevant. Part of these efforts is the development of Neural Network Algorithms to automatically detect calving fronts of marine-terminating glaciers. Gourmelon et al. [1] introduce the first publicly available benchmark dataset for calving front delineation on synthetic aperture radar (SAR) imagery dubbed CaFFe. The dataset consists of the SAR imagery and two corresponding labels: one showing the calving front vs the background and the other showing different landscape regions. Moreover, the paper provides two deep learning models as baselines, one for each label. As there are many different approaches to calving front delineation the question of what method provides the best performance arises. Subsequently, the aim of this thesis is to evaluate the codes of the following two papers [2],[3] on the CaFFe benchmark dataset and compare their performance with the baselines provided by Gourmelon et al. [1].
- paper 1:
Mohajerani et al. [2] employs a Convolutional Neural Network (CNN) with a modified U-Net architecture that also incorporates additional dropout layers. The CNN uses, in contrast to Gourmelon et al. [1], optical imagery as its input.
- paper 2:
Heidler et al. [3] introduces a deep learning model for coastline detection, which combines the two tasks of segmenting water and land and binary coastline delineation into one cohesive multi-task deep learning model.
To make a fair and reasonable comparison, the hyperparameters of each model will be optimized on the CaFFe benchmark dataset and the model weights will be re-trained on CaFFe’s train set. The evaluation will be conducted on the provided test set and the metrics introduced in Gourmelon et al. [1] will be used for the comparison.
References
[1] Gourmelon, N.; Seehaus, T.; Braun, M.; Maier, A.; and Christlein, V.: Calving Fronts and Where to Find Them: A Benchmark Dataset and Methodology for Automatic Glacier Calving Front Extraction from SAR Imagery, Earth Syst. Sci. Data Discuss. [preprint]. 2022, https://doi.org/10.5194/essd-2022-139, in review.
[2] Mohajerani, Y.; Wood, M.; Velicogna, I.; and Rignot, E.: Detection of Glacier Calving Margins with Convolutional Neural Networks: A Case Study, Remote Sens. 2019, 11, 74. https://doi.org/10.3390/rs11010074
[3] Heidler, K.; Mou, L.; Baumhoer, C.; Dietz A.; and Zhu, X.: HED-UNet: Combined Segmentation and Edge Detection for Monitoring the Antarctic Coastline, IEEE Transactions on Geoscience and Remote Sensing. 2022, vol. 60, 1-14, Art no. 4300514, doi: 10.1109/TGRS.2021.3064606.
Animal-Independent Signal Enhancement Using Deep Learning
Examining and segmenting bioacoustic signals is an essential part of biology. For example, by analysing orca calls it is possible to draw several conclusions regarding the animals’ communication and social behavior.1 However, to avoid having to manually go through hours of audio material to detect those calls, the so-called ORCA-SPOT toolkit was developed, which uses Deep Learning to separate relevant signals from pure ambient sounds.2 These may nevertheless still contain background noise, which makes the examination rather difficult. To remove this background noise, ORCA-CLEAN was developed. Again, using a Deep Learning approach by adapting the Noise2Noise concept, as well as using machine-generated binary masks as an additional attention mechanism, the orca calls are denoised as best as possible without requiring clean data as foundation.3
But, as mentioned, this toolkit is optimized for the denoising of orca calls. Marine biologists are of course not the only ones who require clean audio signals for their research. Ornithologists alone deal with a great variety of different noise. One that studies urban bird species would like city sounds to be filtered from his audio samples, whereas one who works with tropical birds rather wants his recordings clean from forest noise. One could argue that almost every biologist who analyses recordings of animal calls would have use for a denoising tool kit.
Another task where audio denoising is of great relevance is when interpreting and processing human speech. It can be used to improve the sound quality of a phone call or a video conference, to preprocess a voice command to a virtual assistant on a smartphone, to improve voice recognition software and many other examples. Even in medicine it can help when analysing pulmonary auscultative signals, which is the key method to detect and evaluate respiratory dysfunctions.4
It therefore makes sense to generalize ORCA-CLEAN and to make it trainable for other animal sounds, perhaps even human speech, or body noises. One would then have a generalized version of ORCA-CLEAN, which can then be trained according to the desired purpose. The goal of this thesis will be to describe and explain the respective changes in the code, as well as to evaluate how differently trained models perform on audio recordings of different animals. The transfer from a model specialized on orcas to one specialized on another animal species will be demonstrated using recordings of hyraxes. The data used contains tapes of 34 hyrax individuals. For each individual there are multiple tapes available, and for each tape there is a corresponding table containing information like the exact location, the length, the peak frequency or the call type for each call on the tape.
The hyrax is a small hoofed mammal of the family of Procaviidae.5, 6 They usually weigh 4 to 5 kg, are about 30 to 50 cm long and are mostly herbivorous.5 Their calls, especially the advertisement calls, are helpful for distinguishing different hyrax species and for analysing the animals’ behaviour.6
Here are a few rough approaches how I would realize this thesis. I would begin by modifying the ORCA-CLEAN code. Since orca calls very much differ from hyrax ones in terms of frequency range as well as in length, the prepocessing of the audio tapes would have to be modified. I would also like to add some more input/output spectrogram variants to the training process.
One could use pairs of noisy and denoised human speech samples for example, or a pure noise spectrogram versus a completely empty one. The probability with which each of these variants is chosen could additionally be made variable.
After that, I would train different models with hyrax audio tapes, including original ORCA-CLEAN as well as an the newly created adaptions, and evaluate their performance. Since the provided hyrax tapes aren’t all equally noisy, they can be sorted by the so-called Signal to Noise Ratio (SNR). One can then compare these values before and after denoising, e.g., by correlating them, and check if the files were denoised correctly or if relevant parts were removed.
With help of these results further alterations can be made, for example by changing the probabilities of the training methods or by adapting the hyperparameters of the deep network, until hopefully in the end, a suitable network which doesn’t require huge amount of data is the result.
I hope I was able to give some insight into what I imagine the subject to be, and how I would roughly execute it.
Sources
[1] https://lme.tf.fau.de/person/bergler/#collapse_0
[2] C. Bergler, H. Schröter, R. X. Cheng, V. Barth, M. Weber, E. Nöth, H. Hofer, and A. Maier, “ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning” Scientific Reports, vol. 9, 12 2019.
[3] C. Bergler, M. Schmitt, A. Maier, S. Smeele, V. Barth, and E. Nöth, “ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication“ Interspeech 2020 (pp. 1136-1140). International Speech Communication Association.
[4] F. Jin and F. Sattar “Enhancement of Recorded Respiratory Sound Using Signal Processing Techniques“ In A. Cartelli, M. Palma (Eds.) “Encyclopedia of Information Communication Technology” (pp. 291-300), 2009.
[5] https://www.britannica.com/animal/hyrax
[6] https://www.wildsolutions.nl/vocal-profiles/hyrax-vocalizations/
Automated Scoring of Rey-Osterrieth Complex Figure Test Using Deep Learning
New speech, motor and cognitive exercises for mobile Parkinson’s Disease monitoring with Apkinson
Writer Verification/Identification using SuperPoint and SuperGlue
Self-supervised learning for pathology classification
Motivation
Self-supervised learning is a promising approach in the field of speech processing. The capacity to learn
representations from unlabelled data with minimal feature-engineering efforts results in increased
independence from labelled data. This is particularly relevant in the pathological speech domain, where the
amount of labelled data is limited. However, as most research focuses on healthy speech, the effect of selfsupervised
learning on pathological speech data remains under-researched. This motivates the current
research as pathological speech will potentially benefit from the self-supervised learning approach.
Proposed Method
Self-supervised machine learning helps make the most out of unlabeled data for training a model. Wav2vec
2.0 will be used, an algorithm that almost exclusively uses raw, unlabeled audio to train speech
representations [1][2]. These can be used as input feature alternatives to traditional approaches using Mel-
Frequency Cepstral Coefficients or log-mel filterbanks for numerous downstream tasks. To evaluate the
performance of these trained representations, it will be examined how well they perform on a binary
classification task where the model predicts whether or not the input speech is pathological.
A novel database containing audio files in German collected using the PEAKS software [3] will be used.
Here, patients with speech disorders, such as dementia, cleft lip, and Alzheimer’s Disease, were recorded
performing two different speech tasks: picture reading in !!Psycho-Linguistische Analyse Kindlicher Sprech-
Störungen” (PLAKSS) and “The North Wind and the Sun” (Northwind) [3]. As the database is still being
revised, some pre-processing of the data must be performed, for example, removing the voice of a (healthy)
therapist from the otherwise pathological recordings. After preprocessing, the data will be input to the
wav2vec 2.0 framework for self-supervised learning, which will be used as a pre-trained model in the
pathology classification task.
Hypothesis
Given the benefits of acquiring learned representations without labelled data, the hypothesis is that the selfsupervised
model’s classification experiment will outperform the approach without self-supervision. The
results of the pathological speech detection downstream task are expected to show the positive effects of
pre-trained representations obtained by self-supervised learning.
Furthermore, the model is expected to enable automatic self-assessment for the patients using minimallyinvasive
methods and assist therapists by providing objective measures in their diagnosis.
Supervisions
Professor Dr. Andreas Maier, Professor Dr. Seung Hee Yang, M. Sc. Tobias Weise
References
[1] Schneider, S., Baevski, A., Collobert, R., Auli, M. (2019) wav2vec: Unsupervised Pre-Training for
Speech Recognition. Proc. Interspeech 2019, 3465-3469
[2] A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised
Learning of Speech Representations,” in Advances in Neural Information Processing Systems. 2020, vol. 33,
pp. 12449–12460, Curran Associates, Inc.
[3] Maier, A., Haderlein, T., Eysholdt, U., Rosanowski, F., Batliner, A., Schuster, M., Nöth, E., Peaks – A
System for the automatic evaluation of voice and speech disorders, Speech Communication (2009)
Classical Acoustic Markers for Depression in Parkinson’s Disease
Parkinson’s disease (PD) patients are commonly recognized for their tremors, although there is a wide range of different symptoms of PD. This is a progressive neurological condition, where patients do not have enough dopamine in the substancia nigra, which plays a role in motor control, mood, and cognitive functions. A really underestimated type of symptoms in PD is the mental and behavioral issues, which can manifest in depression, fatigue, or dementia. Clinical depression is a psychiatric mood disorder, caused by an individual’s difficulty in coping with stressful life events, and presents persistent feelings of sadness, negativity, and difficulty managing everyday responsibilities. This can be triggered by the lack of dopamine from PD, the upsetting and stressful situation of the Parkinson’s diagnosis as well as by the loneliness and isolation that can be caused by the Parkinson’s symptoms.
The goal of this work is to find the most suitable acoustic features that can discriminate against depression in Parkinson’s patients. Those features will be based on classical and interpretable acoustic descriptors.
Cone-Beam CT X-Ray Image Simulation for the Generation of Training Data
Description
Deep Learning methods can be used to reduce the severity of Metal Artefacts in Cone-Beam CT images. This thesis aims to design and validate a simulation pipeline, which creates realistic X-Ray projection images from available CT volumes and metal object meshes. Additionally, 2D and 3D ground truth binary masks should provide a segmentation of metal to be used as ground truth during training. The explicit focus of the data generation will be placed on the accuracy of the Metal Artefacts.
Your qualifications
- Fluent in Python and/or C++
- Knowledge of Homogenous Coordinates and Projective Mapping
- Interest in Quality Software Development / Project Organisation
- Experience with CUDA and interface to C++ / Python (optional, big plus)
You will learn
- to organize a short-term project (report status and structured sub-goals)
- to scientifically evaluate the developed methods
- to report scientific findings in a thesis / a publication
The thesis is funded by Siemens Healthineers and can be combined with a working student position prior to or after the thesis (up to 12 h/week). If interested, please write a short motivational email to Maxi.Rohleder@fau.de highlighting your qualifications and describe one related code project you are proud of. Please also attach your CV and transcript of records from your current and previous studies.