Index

Towards integration of prior knowledge using spatial transformers for segmentation

Description

The task of segmentation of organs or lesions in medical images such as magnetic resonance or computed tomography plays an important role for clinicians. Based on the annotations, volumes and measures can be made that characterize the patient’s disease state and track it over time. However, manual, or semi-automatic segmentation takes a substantial amount of time, requires experienced annotators, and in some cases has large inter-reader variance. Instead of manually annotating large amounts of data by hand, machine learning-based methods allow to propose segmentations without further attention required. Deep learning-based neural networks have achieved outstanding performance for cell segmentation, organ segmentation, and cardiac segmentation. However, prior information about the organ of interest is typically harder to incorporate into deep learning models. Previous works utilized active shape models (ASM) or relied on the principal component analysis to enforce constraints extracted from prior information [4,5].

In this thesis, spatial transformer networks (STN) [1] will be utilized in combination with reference shapes to obtain predictions heavily utilizing prior information. The dataset proposed for this task consists of 100 three-dimensional volumes of the heart [2] (https://acdc.creatis.insa-lyon.fr/).

[2]

Steps

This thesis can be divided into the following steps:

Training of a segmentation network (e.g., nn U-Net [3]).
Application of a rigid 2D STN to the prediction of the segmentation network + reference shape.
Extension to deformable 2D STN.
Extension of STN to 3D input and application to the segmentation prediction.

Figure 1: Example case of the ACDC dataset with overlaying segmentation masks (left) and 3D rendering of the segmentation mask (right) [2].

Requirements

Good understanding of machine learning/deep learning concepts
Strong knowledge of Python and ideally Pytorch is required

References

[1] Jaderberg, Max, et al. “Spatial transformer networks.” arXiv preprint arXiv:1506.02025 (2015).

[2] O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, et al. “Deep Learning Techniques for Automatic MRI Cardiac Multi-structures Segmentation and Diagnosis: Is the Problem Solved ?” in IEEE Transactions on Medical Imaging, vol. 37, no. 11, pp. 2514-2525, Nov. 2018

[3] Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2020). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 1-9.

[4] Milletari F., Rothberg A., Jia J., Sofka M. (2017) Integrating Statistical Prior Knowledge into Convolutional Neural Networks. In: Descoteaux M., Maier-Hein L., Franz A., Jannin P., Collins D., Duchesne S. (eds) Medical Image Computing and Computer Assisted Intervention − MICCAI 2017. MICCAI 2017. Lecture Notes in Computer Science, vol 10433. Springer, Cham.

[5] Ahmadi, S.-A., Baust, M., Karamalis, A., Plate, A., Boetzel, K., Klein, T., Navab, N.: Midbrain segmentation in transcranial 3D ultrasound for parkinson diagnosis. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6893, pp. 362–369. Springer, Heidelberg (2011)

Autoencoding CEST MRI Spectra

Improvement of Patient Specific SPECT and PET Brain Perfusion Phantoms for Assessment of Partial Volume Effect

Deriving procedure definitions from examination data by means of machine-learning

Project_Description_David (1)

Deep Orca Image Denoising Using Machine-Generated Binary Killer Whale Masks

Introduction
The following proposes the use of binary masks as a ground truth for the
denoising of deep learning based killer whale classication. It is part of a
project of the Pattern Recognition Lab of the FAU in cooperation with the
Vancouver Maritime Institute. Based on thousands of images of killer whale
populations taken over the last years, a deep learning approach was used
to ease the classication of individual animals for local researchers, both
visual and with call recognition [2]. Previous work focused on the extraction
of regions of interest from the original images and classication of single
animals. To limit the in
uence of noise on the classication, this thesis aims
to create binary masks of the animals from image segmentation. Binary
masks often present an accurate ground truth for deep learning approaches.
The following work is therefore closely related to [4]. It is part of the visual
counterpart of the existing “Denoising Kit” for audio signals of killer whales.
Motivation
Noise plays a crucial role in the detection and reconstruction of images.
In this case close color spaces and partially blurry images throughout the
extracted data limit the success of deep learning based classication. With
a binary mask of the orca body as a ground truth, a network can be trained
1
without the in
uence of noise. This can further increase the accuracy of orca
detection, helping researchers to track animal populations much easier.
Approach
Two approaches have presented themselves to be most ecient and are going
to be utilized. First, common methods are used to detect edges in the orca
images. This will be done with the popular Canny Edge Algorithm. The
images are also processed by a superpixel segmentation algorithm [5]. By
overlaying both results, an accurate outline of the animals shape can be
segmented. After a binarization, the resulting mask will be used as a ground
truth for a deep learning network. With it the original images are denoised
to account for a better classication later.
Finally this thesis will look into intelligent data augmentation in the form
of image morphing techniques, utilizing the created binary masks. With
feature-based image morphing [1], the variety of training data and therefore
also the accuracy of the underlying classier could be further improved.
Medical application
Ground truth binary masks can and in some parts already have application
in computer vision tasks in the medical eld. Deep learning classication of
tumors in CT and MRI images are often based on binary masks, traced by
radiologists [3]. Similar issues regarding noise are often faced.
References
[1] Thaddeus Beier and Shawn Neely. Feature-based image metamorphosis.
ACM SIGGRAPH computer graphics, 26(2):35{42, 1992.
[2] Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas K Maier,
Volker Barth, and Elmar Noth. Deep learning for orca call type
identication-a fully unsupervised approach. In INTERSPEECH, pages
3357{3361, 2019.
[3] Francisco Javier Daz-Pernas, Mario Martnez-Zarzuela, Mriam Anton-
Rodrguez, and David Gonzalez-Ortega. A deep learning approach for
2
brain tumor classication and segmentation using a multiscale convolutional
neural network. In Healthcare, volume 9, page 153. Multidisciplinary
Digital Publishing Institute, 2021.
[4] Christian Bergler et. al. Orca-clean: A deep denoising toolkit for killer
whale communication. INTERSPEECH, 2020.
[5] Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based
image segmentation. International journal of computer vision, 59(2):167{
181, 2004.

Killer Whale Sound Source Localization Using Deep Learning

1 Introduction
Sound source localization (SSL) is not necessarily a new eld and much has been done in the analytical
domain using multiple microphones and utilizing the distance between the microphones as well as the Time
Delay of Arrival (TDOA) to extract position information [1]. The advances in recent years of machine
learning and deep learning techniques as well as the increasing availability of powerful hardware have opened
up new pathways to solving SSL and Sound Event Detection (SED) tasks. These methods are of particular
interest due to their reported robustness when it comes to dealing with noise as well as their performance
in comparison to conventional methods [2]. Most uses of SSL seem to be involved in human tracking and
relatively little has been done with focus on other animals and even less has been done in nature as opposed
to a closed room. This project aims to utilize deep learning SSL methods to locate orcas by using calls
received by an 8-microphone array being pulled by a boat as presented in [3].
2 Problem Description
The localization of orcas based on their emitted calls presents several problems, not the least of which is
determining the actual distance of the orcas from a certain position. While several methods such as [4] can
accurately locate the position of a sound source in relation to a unit sphere around the microphones, depth
perception presents another problem. This problem is compounded by the fact that the assumption simply
cannot be made that two dierent orcas produce calls at the same amplitude. Therefore an accurate depth
estimation can only be made with a varying degree of certainty.
This audio depth perception problem is further compounded by the fact that sound waves travel at
dierent speeds in water depending on certain circumstances like temperature, salinity, and other objects
present in the water, which, when one is in nature, is a variable which is often completely out of the control of
the observer [5]. In addition to the environmental problems lending to the diculty of audio depth perception
a further issue is encountered when dealing with orcas in that the animals can be several hundred meters
away and the produced calls will still reach the microphones with enough intensity to be registered as a call
of interest which increases the viable range and adds more uncertainty.
An additional problem ecountered in the eld is while it can be easy to see an orca if they happen to
be near the surface, it is another issue entirely to associate the produced calls of a particular animal to the
individual which created these calls.
Finally what is necessary in an environment of observation is a tool which can quickly and accurately
associate the produced calls of an animal with the location of production. If this assignment is not quick or
accurate enough it can easily be the case that the orca has since moved on and the location information is
no longer necessary nor accurate.
Master’s Thesis Proposal
A Deep Learning Toolkit for Killer Whale Localization Based On Emitted Calls Alexander Barnhill
3 Goals
The goal of this project is then to develop a toolkit which functions in conjunction with the ORCA-SPOT
toolkit; that is, when ORCA-SPOT has determined that a call has been produced and this call is of interest,
the location of the call producer will be determined quickly enough to be used during active observation.
This means that the location information should be produced quickly enough to tell researchers with enough
accuracy both in time and location to be able to say with relative certainty win which direction the orcas
being observed are currently located.
In addition to this the distance of the animal should be determined as accurately as possible by gathering
enough information to say with relative certainty, depending on the intensity of the received signal, how far
removed the producer of the signal is from the point of reception.
In addition to this, time permitting, the tool should also allow for a ner analysis of received calls, possibly
including the possibility to associate the received calls with particular animals.
4 Project Plan
In order to accomplish these goals a multi-step approach will be undertaken. First simulated data from
PAMGuard will be taken and processed in order to simulate orcas at random positions. The benet here is
that real orca samples can be used but the dataset can be expanded as much as desired in order to give not
only a large but also a representative dataset containing a wide variety of signals, positions, and amplitudes.
This dataset will be further processed by adding varying amounts of noise to the samples in order to increase
the robustness of the toolkit.
After the dataset is produced various network architectures will be tried in order to:
a) Produce a network which reliably and accurately determines the source of the sample and
b) Is small enough to quickly provide information at inference time in order to enable the localization of
orcas in real time.
During experimentation with architectures methods will also be tested to estimate distance of the orcas
including attempting to segment the distance measurements into discrete areas and then applying some
Gaussian model to these areas in the hopes of achieving a reasonable estimate of the distance of the orca
based on training data as well as amplitude of the received signal.
This toolkit will then be integrated with ORCA-SPOT to continuously accept samples from the ORCASPOT
toolkit with the goal of then localizing samples which have been deemed interesting by ORCA-SPOT.
The toolkit will have to process samples containing varying amounts of signals and samples in which the
interesting signal occurs at varying points within the sample.
4.1 Proposed Schedule
1. Preparation and generation of data for training: 2 weeks
2. Investigation of architectuers with respect to performance including experimentation and testing: 2
months
3. Analysis and implementation of depth estimation: 1 Month
4. Integration with ORCA-SPOT: 1 Month
5. Summary of results: 1 month
Master’s Thesis Proposal
A Deep Learning Toolkit for Killer Whale Localization Based On Emitted Calls Alexander Barnhill
References
[1] X. Bian, Gregory D. Abowd, and James M. Rehg. Using sound source localization to monitor and infer
activities in the home. 2004.
[2] Nelson Yalta, Kazuhiro Nakadai, and Tetsuya Ogata. Sound source localization using deep learning
models. Journal of Robotics and Mechatronics, 29:37{48, 02 2017.
[3] Christian Bergler, Hendrik Schroter, Rachael Xi Cheng, Volker Barth, Michael Weber, Elmar Noth,
Heribert Hofer, and Andreas Maier. Orca-spot: An automatic killer whale sound detection toolkit using
deep learning. Scientic Reports, 9(1):10997, 2019.
[4] Sharath Adavanne, Archontis Politis, Joonas Nikunen, and Tuomas Virtanen. Sound event localization
and detection of overlapping sources using convolutional recurrent neural networks. CoRR,
abs/1807.00129, 2018.
[5] Jens Blauert. Spatial hearing: the psychophysics of human sound localization. 01 2001.

Web-Based Server-Client Software Framework for Killer Whale Indivirual Recognition

1. Motivation
In the last decades, more and more documentation and data storage for research purposes was done with computers, enabling the use of algorithms to work on the data. In the 1970s, it was discovered that individual killer whales, like orcas, can be identified by their natural markings, like their fin. Re-searchers have taken pictures of killer whales to identify and document them during the last years. Each discovered orca gets a unique ID, which will be referred as “label” in the further context. This identification process is currently done manually by researchers for each picture. For his master’s the-sis “Orca Individual Identification based on Image Classification Using Deep Learning”, Alexander Gebhard developed a machine-learning pipeline that can identify individual orcas on images. If the pipeline is given a picture without labels, it will return the most likely labels for this picture. If it is given a picture with a label, then the labels and the image will be used to train the pipeline so that it can identify individual orcas better. The goal of my bachelor’s thesis is to develop and document a user-friendly web-based server-client framework (here referred as FIN-PRINT) that allows researchers to synchronize images and labels with the pipeline and to create as well as to store labels and relevant meta-data.
2. Concept
2.1 Description
FIN-PRINT is a platform-independent framework that can run on different operating systems, like Win-dows, Mac OS and Linux. Users of this framework can interact with it over several webpages that are hosted locally on their computer. The interface allows the user to browse through images of orcas on their local hard-drive and to add labels to these images. They can also check automatically generated labels from the machine-learning pipeline and make corrections to the labels if necessary. Labeling new images or checking the labels from the pipeline can be done offline without an internet connec-tion if the necessary files are present. If the user has an active internet connection, they can also ex-plore statistics from a database which stores relevant data about the images and their labels, like GPS-data, the dates where an individual orca was spotted or if certain orcas were spotted together. Manu-ally labeled pictures can be uploaded to the pipeline, and automatic labels from the pipeline can be downloaded.
2.2 The Framework
The Framework has four mayor parts that interact with each other: A browser interface, a local server, an external server, and the machine-learning pipeline. The browser interface has several webpages
2
which are hosted locally on the local server that runs on the computer of the user. When FIN-PRINT is opened, the local server starts, and the user can access the user interface in their web browser. The local server can access the images and labels on the computer of the client and sends them to the browser. This allows the user to view the images and to label them in their browser. If the user has labeled some pictures, the browser sends the newly created labels to the local server which stores them on the hard drive. It is also possible for the user to view and check labels from the pipeline and to correct the prediction of the pipeline. If an active internet connection is available, the local server can upload new images and labels to the external server or download automatically gen-erated labels from there anytime. As an interface between the local server and the pipeline, the exter-nal server can give the local server files from the pipeline and vice versa. If the pipeline gets an unla-beled image, it identifies all orcas on the images, and returns cropped images, each of them shows the fin an orca. Every cropped image also gets its own label file. These files can be sent to the user so that they can validate the predictions of the pipeline. If the pipeline gets an image with labels, it uses it to train itself. The external server saves all data related to images and their labels in a database, like the name of the uploader and the meta-data of the images (like GPS-coordinates and time stamps). Data from this database can be used to generate statistics about the orcas. With an active internet connec-tion, the user can use the web interface to request certain information that is stored in the database.
2.3 Programming tools
The web interface uses CSS, HTML and the scripting language JavaScript, which are supported by all normal web browsers like Safari, Mozilla Firefox or Google Chrome. To ensure that the local server can run on any platform, it is programmed with node.js, a platform independent server framework that also uses JavaScript. The external server also uses node.js and will be hosted at the Friedrich-Alexander university. As JavaScript and node.js are wide-spread programming tools, it offers other people to maintain the framework and to add new features to it if necessary.
3. Literature
[1] C. Bergler et al. “FIN-PRINT: A Fully-Automated Multi-Stage Deep-Learning-Based Framework for Killer Whale Individual Recognition”, Germany. FAU Erlangen-Nuernberg. Date: not published yet.
[2] Alexander Gebhard. “Orca Individual Identification based on Image Classification Using Deep Learn-ing”, Germany. FAU Erlangen-Nuernberg. Date: October 19, 2020.
[3] J. Towers et al. “Photo-identification catalogue, population status, and distribution of bigg’s killer whales known from coastal waters of british Columbia”, Canada. Fisheries and Oceans Canada, Pacific Biological Station, Nanaimo, BC. Date: 2019.

Multi-task Learning for Molecular Odor Prediction With Multiple Datasets

CoachLea: An Android Application to evaluate the progress of speaking and hearing abilities of children with Cochlear Implant

In 2018 the WHO estimated that there are 466 million persons (34 million children) with disabling hearing loss which equals about 6.1% of the world’s population. For individuals with severe loss who do not benefit from standard hearing aids, the cochlear implant (CI) represents a very important advance in treatment. However, CI users often present altered speech production and limited understanding even after hearing rehabilitation. Thus, if the deficits of speech would be known the rehabilitation might be addressed. This is particularly important for children who were born deaf or lost their hearing ability before speech acquisition. On the one hand, they have not been able to hear other people speak in the time before the implantation. On the other hand, they have never been able to monitor their own speech. Therefore, this project proposes an Android application to evaluate the speech production and hearing abilities of children with CI.

Knowledge distillation for landmark segmentation in medical image analytics

Motivation: One major challenge that comes with deep learning approaches in medical image processing is the issue of high cost of expert annotated data. Therefore semi-supervised learning approaches are of high interest in the research community.

Methods: The application of the work concerns landmark segmentation by heatmap re-gression in thorax diagnostics. To keep the need of annotated data low an approach of knowledge distillation with application of a teacher-student concept will be persued. The essence of this method is to transfer knowledge from a pre-trained model or an ensemble ofmodels to a new model. Originally this technique was introduced to reduce the capacity of huge good-performing networks while keeping the accuracy [1]. But it has also already been applied for semi-supervised learning purposes [2], what will be further investigated in this work. Aim of this master theses is to examine the benefit of student-teacher approaches in semi-supervised learning. For this purpose different variants of this method will be considered, implemented and compared to each other. This will be done in cooperation and with provision of data and infrastructure by Siemens Healthineers.

The Master’s thesis covers the following aspects:

1. Literature research on state-of-the-art methods

2. Set up infrastructure and data used for the project

3. Implementation of algorithms in the framework

4. Training and tuning of hyper-parameters

5. Performance comparison of the different variants

6. Evaluation of developed algorithms

Project_Description_Viktoria

[1] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015.

[2] S. Sedai, B. Antony, R. Rai, K. Jones, H. Ishikawa, J. Schuman, W. Gadi, and R. Garnavi, “Uncertainty guided semi-supervised segmentation of retinal layers in oct images,”in International Conference on Medical Image Computing and Computer-Assisted Inter-vention, pp. 282–290, Springer, 2019.