Index

Geometric Deep Learning for Multifocal Diseases

Diseases are classied as multifocal if they are relating to or arising from many foci. They are present in various
medical disciplines, e.g. multifocal atrial tachycardia [1], breast cancer [2] or multifocal motor neuropathy [3].
However, analyzing diseases with multiple centers brings several challenges for conventional deep learning ar-
chitectures. On a technical side, it is complex to handle a varying number of centers which have no unique
sequence. From a medical view, it is important to model structures and relationships between the foci. The grid
structure used in convolutional neural networks cannot handle non-regular neighborhoods. A suitable approach
for this task would be to convert the data into graph structures, where the nodes describe the properties of the
foci and the edges model their mutual relationships. With geometric deep learning, it is possible to learn from
graph structures. It is an emerging eld of research with many possible applications, e.g. classifying documents
in citation graphs or analyzing molecular structures [4]. There also exist several medical applications, e.g. for
analysis of parcinson’s disease [5] or artery segmentation [6]. This thesis aims to investigate the applicability of
this method for relatively small graphs coming from multifocal diseases. The networks are trained to predict
time to events of failure as a metric for the severeness of the disease. Dierent geometric layer architectures,
such as Graph-Attention-Networks [7] and Dierential Pooling [8], are investigated and compared to the per-
formance of a conventional neural network. As we aim to create explicable models, it is intended to provide
visualizations of salient sub-graphs and features of the results. In addition to that, methods to incorporate prior
knowledge from the medical domain into the training process are tested to improve the speed of convergence
and strengthen the medical validity of the predictions. In the end, the networks are tested on liver data.

Summary:

1. Transfer multifocal diseases to meaningful graph structures
2. Provide conventional neural network for time to event regression as baseline
3. Investigate and tune dierent geometric deep learning architectures
4. Visualize salient graph structures

References
[1] Jane F. Desforges and John A. Kastor. Multifocal Atrial Tachycardia. New England Journal of Medicine,
322(24):1713{1717, jun 1990.
[2] John Boyages and Nathan J Coombs. Multifocal and Multicentric Breast Cancer: Does Each Focus Matter?
Article in Journal of Clinical Oncology, 23:7497{7502, 2005.
[3] Eduardo Nobile-Orazio. Multifocal motor neuropathy. Journal of Neuroimmunology, 115(1-2):4{18, apr
2001.
[4] Michael Bronstein, Joan Bruna, Yann Lecun, Arthur Szlam, and Pierre Vandergheynst. Geometric Deep
Learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4):18{42, 2017.
[5] Xi Zhang, Lifang He, Kun Chen, Yuan Luo, Jiayu Zhou, and Fei Wang. Multi-View Graph Convolutional
Network and Its Applications on Neuroimage Analysis for Parkinson’s Disease. AMIA … Annual Symposium
proceedings. AMIA Symposium, 2018:1147{1156, 2018.
[6] Jelmer M. Wolterink, Tim Leiner, and Ivana Isgum. Graph Convolutional Networks for Coronary Artery
Segmentation in Cardiac CT Angiography. In Lecture Notes in Computer Science (including subseries
Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), volume 11849 LNCS, pages
62{69. Springer, oct 2019.
[7] Petar Velickovic, Arantxa Casanova, Pietro Lio, Guillem Cucurull, Adriana Romero, and Yoshua Bengio.
Graph attention networks. 6th International Conference on Learning Representations, ICLR 2018 – Con-
ference Track Proceedings, pages 1{12, 2018.
[8] Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. Hierarchi-
cal Graph Representation Learning with Dierentiable Pooling. Advances in Neural Information Processing
Systems, 2018-Decem:4800{4810, jun 2018.

Semi-Supervised Tooth Segmentation in Dental Panoramic Radiographs Using Deep Learning

In dentistry dental panoramic radiographs are used by specialists to complement the clinical examination in the diagnosis of dental diseases, as well as in planning the treatment. They allow the visualization of dental irregularities, such as missing teeth, bone abnormalities, tumors, fractures and others. Dental panoramic radiographs are a form of extra-oral radio- graphic examination, meaning the patient is positioned between the radiographic film and the X-ray source. The scan describes a half-circle from ear to ear, showing a two-dimensional view of upper and lower jaw. In contrast to the intra-oral radiographs, like bitewing and periapical radiographs, dental panoramic radiographs are not restricted to an isolated part of the teeth and also show the skull, chin, spine and other details originated from the bones of the nasal and face areas, making these images much more difficult to analyze.

An automatic segmentation method to isolate parts of dental panoramic radiographs could be a beginning of helping dentists in their diagnoses. Tooth segmentation could be the first step towards an automated analysis of dental radiographs. In this thesis the labeled data by Jader et al. will be used, supplemented by a dataset of 120.000 unlabeled images, provided by the University Hospital Erlangen. It will be investigated how we can achieve reasonable segmentation results on a large unlabeled dataset, utilizing a smaller annotated dataset from a different source. For this purpose different bootstrapping methods will be analyzed, to improve the segmentation results using semi- supervised learning.

Weakly Supervised Learning for Multi-modal Breast Lesion Classification in Ultrasound and Mammogram Images

Breast cancer has become one of the most common and leading types of cancer in women, which has taken death rate of 11.6 percent of the total cancer deaths worldwide. The mortality rate is increasing in recent years. It must be emphasized that early detection of a breast tumor can help to increase early treatment options that control the mortality rate among women. There are different diagnostic imaging modalities, which help doctors diagnose whether the patient is under the risk of possessing cancerous tumor.

Imaging modalities like ultrasound and mammogram are both used for screening of breast lesions. Mammogram, on the one hand, uses low radiation dose and takes an image of the breast as a 2-D image. Ultrasound, on the other hand, uses high frequency waves capturing an image of the breast as a 3-D image. Both modalities capture different useful information with their acquisition methods. Patients usually undergo diagnosis with mammography for initial lesion detection. But due to its low sensitivity, there are chances to miss detection of small tumors in heavy and dense breasts. Those patients that are highly suspected to the abnormalities are further diagnosed with ultrasound. Ultrasound images give more detailed information about the surrounding area of concern and hence also help radiologists investigate the vulnerability of the lesion.

The main aim of this thesis is to investigate the performance of deep learning models for classification of breast lesion using a dataset of ultrasound and mammogram images individually. Further, based on the evaluation of the performance results of these models, we would build a single deep learning model, which combine the information from both ultrasound and mammogram imaging modalities. An analysis of the performance of the fused and the individual models will also be performed.
The dataset which will be used to train the models consists of volumetric ultrasound images and 2-D mammogram images and is provided by University Clinics Erlangen. Weakly supervised approaches will be used with the classification labels defined at image level without further localisation. There are 468 patient files consisting of ultrasound and mammogram images of healthy and non-healthy patients. The latter can have either benign or malignant lesions.

Unsupervised Domain Adaptation using Adversarial Learning for Multi-model Cardiac MR Segmentation

Recently, numerous adversarial learning based domain adaptation methods for semantic segmentation have been proposed. For example, Vu et al. minimized entropy of the prediction and also introduced the entropy discriminator to discriminate the source entropy maps from the target entropy maps. In 2018, Tsai et al. found output space contains rich information thus, they proposed the output space discriminator. Both of the methods have achieved promising results in street scene segmentation, while for medical image segmentation, we can take advantage of the information in the shape of the organs. For instance, point clouds can be used to create 3D models to incorporate shape representation as prior information. Cai et al. introduced the organ point network. It takes deep learning features as input and generates the shape representation as a set of points located on the organ surface. They optimized the segmentation task with the point network as an auxiliary task so that the shared parameters could benefit from both tasks. They also proposed a point cloud discriminator to guide the model to capture the shape information better.

We aim to combine the ideas from the previous works and investigate the impact of output space and entropy discriminators for multi-modality cardiac image segmentation. We want to employ point cloud classification as an auxiliary task, and introduce a point cloud discriminator to discriminate the source point cloud from the target point cloud.

Marker Detection Using Deep Learning for Universal Navigation Interface

In the contemporary practice of medicine, minimally invasive spine surgery (MISS) is widely performed to avoid
the damage to the muscles surrounding the spine. Compared with traditional open surgeries, patients with MISS
suffer from less pain and can recover faster. For MISS, computer assisted navigation systems play an very important
role. Image guided navigation can deliver more accurate pedicle screw placement compared to conventional surgical
techniques. It also reduces the amount of X-ray exposure to surgeons and patients. In computer assisted navigation
for MISS, registration between preoperative images (typically 3D CT volumes) and intraoperative images (typically
2D fluoroscopic X-ray images) is usually a step of critical importance. To perform such registration, various markers
[1] are used. Such markers need to be identified in the preoperative CT volumes. In practice, due to the limited
detector size, the markers might be located outside the field-of-view of the imaging systems (typically C-arm or Oarm
systems) for large patients. Therefore, the markers are only acquired in projections of a certain view angles. As a
consequence, the reconstructed markers in the 3D CT volumes suffer from artifacts and have distorted shapes, which
cause difficulty for marker detection. In the scope of this master’s thesis, we aim to improve the image quality of CT
reconstructions from such truncated projections using deep learning [2, 3] so that a universal navigation interface is
able to detect markers without any vendor specific information. Alternatively, general marker detection directly in
X-ray projection images before 3D reconstruction using deep learning will also be investigated.

The thesis will include the following points:

Literature review on deep learning CT truncation correction and deep learning marker detection;

Simulation of CT data with various marker sizes and shapes;

Implementation of our U-Net based deep learning method [3] with extension to high resolution reconstruction;

Performance evaluation of our U-Net based deep learning method on the application of marker reconstruction;

Investigation of deep learning methods on marker segmentation directly in 2D projections;

Reconstruction of 3D markers based on segmented marker projections.

References
[1] S. Virk and S. Qureshi, “Navigation in minimally invasive spine surgery,” Journal of Spine Surgery, vol. 5,
no. Suppl 1, p. S25, 2019.
[2] ´ E. Fourni´e, M. Baer-Beck, and K. Stierstorfer, “CT field of view extension using combined channels extension
and deep learning methods,” in Proceedings of Medical Imaging with Deep Learning, 2019.
[3] Y. Huang, L. Gao, A. Preuhs, and A. Maier, “Field of view extension in computed tomography using deep learning
prior,” in Bildverarbeitung f¨ur die Medizin 2020, pp. 186–191, Springer, 2020.

Representation learning strategies to model pathological speech: effect of multiple spectral resolutions

Description:
Speech signals contain paralinguistic information with specific cues about a given speaker including the
presence of diseases that may alter their communication capabilities. The automatic classification of
paralinguistic aspects has many potential applications, and has received a good deal of attention by the
research community [1-3]. In particular researchers look at clinical observations in the speech of patients
and try to objectively and automatically measure two main aspects of a given disease: (1) the presence of
a disease via classification of healthy control (HC) subjects and patients, and (2) the level of degradation
of the speech of patients according to a specific clinical scale [4]. These aspects are evaluated using
computer aided methods supported in signal processing and pattern recognition methods.
At the center of these computer aided methods and something that has been developed over the years to
continually improve the diagnosis and the assessment of severity of different pathological diseases is the
particular feature set and extraction method used [5-7]. Many recent studies focused on extracting
features for assessment of pathological speech rely on deep learning strategies [3].
In this project we consider one such approach that uses a parallel representation learning strategy to
model speech signals from patients with different speech disorders [8]. The model uses two types of
autoencoders, a convolutional autoencoder (CAE) and recurrent autoencoder (RAE). Both take as input
a spectrogram and output features derived from a hidden representation in the bottleneck space (i.e. a
compressed representation of the input). In addition, the reconstruction error of the autoencoder in
different spectral components of the speech signal is considered as a feature set.
The aim of this project is to evaluate the performance of the parallel representation learning strategy
using different parameterized representations of the spectrogram (e.g. comparing broadband and
narrowband spectral representations) as well as a wavelet representation to quantify the information
loss for each representation, and the benefit of using all of them together as a multiple input channel.
Methods for quantification include the overall ability of the proposed model to classify different
pathologies and the associated level of degradation of a given patient’s speech, and also comparing the
input and reconstructed speech signals using contours of phonological posteriors [9]. The aim is to
evaluate which group of phonemes are more affected due to the compression of the autoencoders using
the different spectral resolutions and their combinations.

References:

[1] Schuller, B., Batliner, A., 2013. Computational Paralinguistics: Emotion, Affect and Personality in
Speech and Language Processing. John Wiley & Sons.
[2] Schuller, B., et al., 2019. Affective and Behavioural Computing: Lessons Learnt from the First
Computational Paralinguistics Challenge. Computer Speech & Language 53, 156–180.
[3] Cummins, N., Baird, A., Schuller, B., 2018. Speech Analysis for Health: Current State-of-the-Art
and the Increasing Impact of Deep Learning Methods.
[4] Orozco-Arroyave, J.R., et al., 2015. Characterization Methods for the Detection of Multiple Voice
Disorders: Neurological, Functional, and Laryngeal Diseases. IEEE Journal of Biomedical and Health
Informatics 19, 1820–1828.
[5] Dimauro, G., Di-Nicola, V., et al., 2017. Assessment of Speech Intelligibility in Parkinson’s
Disease Using a Speech-to-Text System. IEEE Access 5, 22199–22208.
[6] Orozco-Arroyave, J.R., Vasquez-Correa, J.C., et al., 2016. Towards an Automatic Monitoring of
the Neurological State of the Parkinson’s Patients from Speech, in: IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 6490–6494.
[7] Schuller, B., Steidl, S., Batliner, A., Hantke, S., Hönig, F., Orozco-Arroyave, J.R., Nöth, E., Zhang,
Y., Weninger, F., 2015. The INTERSPEECH 2015 Computational Paralinguistics Challenge:
Nativeness, Parkinson’s & Eating Condition, in: Proceedings of INTERSPEECH, pp. 478–482.
[8] Vásquez-Correa, Juan Camilo et al. “Parallel Representation Learning for the Classification of
Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate” Under Review (2020).
[9] Vásquez-Correa, Juan Camilo et al. “Phonet: A Tool Based on Gated Recurrent Neural Networks to
Extract Phonological Posteriors from Speech.”, in: INTERSPEECH (2019).

Implementation and Evaluation of Cluster-based Self-supervised Learning Methods

Prototypical Contrastive Learning (PCL) [1] is a new unsupervised representation learning method
which unifies the two directions of unsupervised learning: clustering and contrastive learning. This
method can train deep neural networks from millions of unlabeled images. Conventional contrastive
learning was instance-based instead of prototype-based. The authors introduced prototypes as the
cluster centers of similar images. The training setup works in an EM-like scheme: Find the distribution
of prototypes by clustering in step E; optimize network by performing contrastive learning in the M step.
Additionally, they proposed the ProtoNCE loss, which generalizes the commonly used InfoNCE loss.
With this method, the authors report over 10% performance improvement across multiple benchmarks.
The clustering of the PCL is computed by k-means. However, this clustering may deteroriate over time,
causing problems, such as classifying all samples into the same category. The solution proposed by
Asano and Rupprecht [2] is to add a constraint, the labels must be equally distributed to all samples,
that is, to maximize the information between the indicators and labels of the sample. The problem
of label assignment is equivalent to optimal transport. In order to expand to millions of samples and
thousands of categories, a fast version of the Sinkhorn-Knopp algorithm is used to find an approxi-
mate solution. In summary, they replace the k-means in DeepCluster [3] with the Sinkhorn-Knopp
algorithm to approximate the label assignment Q, and then use cross-entropy to learn the representation.
In this work, the k-means clustering of the PCL shall be replaced by the Sinkhorn-Knopp algo-
rithm and be thoroughly evaluated on multiple datasets.
The thesis consists of the following milestones:
• Literature study on self-supervised learning [4][5]
• Implementation of [1] and [2]
• Implementation of the combination of PCL and Self-labelling with Optimal Transport
• Thorough evaluation using different datasets and compare it with PCL and Self-labeling
• Comparison with other self-supervised learning papers
• Further experiments regarding learning procedure and network architecure
The implementation should be done in Python, PyTorch.

References
[1] Junnan Li, Pan Zhou, Caiming Xiong, Richard Socher, and Steven C. H. Hoi. Prototypical Contrastive
Learning of Unsupervised Representations. arXiv:2005.04966 [cs], July 2020. arXiv: 2005.04966.
[2] Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering
and representation learning. arXiv:1911.05371 [cs], February 2020. arXiv: 1911.05371.
[3] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep Clustering for Unsupervised
Learning of Visual Features. In Computer Vision – ECCV 2018, volume 11218. Cham: Springer International
Publishing, 2018.
[4] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A Simple Framework for
Contrastive Learning of Visual Representations. arXiv:2002.05709 [cs, stat], June 2020. arXiv: 2002.05709.
[5] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum Contrast for Unsupervised
Visual Representation Learning. arXiv e-prints, arXiv:1911.05722 (Nov. 2019). arXiv: 1911.05722 [cs.CV]].

Optimization of the Input Resolution for Dermoscopy Image Classification Tasks

Towards Efficient Incremental Extreme Value Theory Algorithms for Open World Recognition

Classification problems that evolve over time require classifiers that can adapt to previously unseen
classes and also to a change in the class over time. This is called “open world classification“ and is an
already well established research topic [1, 2, 3, 4, 5]. One good example for this is face recognition. A
classifier for this task should recognize previously unseen persons and classify them in another class.
But also the face of a person might change over time, that is where incremental updates would be
better suited instead of retraining the complete models. While there are many classifiers performing
this static task, it is not straightforward to transfer those into an efficient incremental update algorithm
[6]. The Extreme Value Machine (EVM) [6] is an open set classifier, which can perform incremental
learning and is based on a strong theoretical foundation. It uses extreme value theory based calibrations
to fit Weibull-distributions around each sample. Compared to a classification via a similarity function
and thresholds, the EVM leads to naturally softer and more robust bounds. It is also less sensitive to
noise and performs well, especially in open space [7].
However the Extreme Value Machine can be used incrementally, there is no efficient update mechanism
provided. Transferring an algorithm to incremental updates can have several direct problems like
finding an efficient update function, or a limitation in space. One cannot save all previously seen
samples. Therefore a model reduction algorithm, which approximates the set cover algorithm, is
proposed in the EVM. Yet there are also several indirect problems, like a concept drift, that is when
the samples for a class change, either gradually or abruptly. The model is able to adapt to it, but with
it comes another challenge the so called “stability-plasticity dilemma“ [8]. This means the weigh up
between a rather fast or slow adaption to change in one class. A fast adaption can result in an unwanted
adaption to noise. Yet a slow adaption can lead to missing cyclical or constantly gradual change in the
data. Also the model reduction used in the EVM can lead to unbalanced classes and in the extreme
case to a nearly complete removal of some classes. This is called “catastrophic forgetting“ [8]. These
problems are not directly solved by the EVM and should also be assessed in this work.
In this thesis, the EVM will be extended by incremental update mechanisms. Starting with an exact
naive approach, later approximative algorithms will be tested for both efficiency and accuracy. One of
the main problems in terms of efficiency is that when adding new samples to the EVM, all previously
trained points need to be updated according to the naive EVM algorithm. Also when updating an
existing sample it needs to be compared to all the samples of all other classes. The first optimizations
are based on these two properties. First not all previously trained samples have to be updated, finding
the ones, on sample and class base, is the challenge. Second not all feature-points have to be compared
to while (re-)training another feature-point. Later those variants will be evaluated on different datasets.
One important one will be a face dataset for the above stated, given challenges.
The thesis consists of the following milestones:
• Implementation of the exact incremental update algorithm.

• Evaluating performance on MNIST, ImageNet, Face Dataset.

• Expanding of approximative update algorithms.

• Further experiments regarding learning and forgetting procedure.

The implementation should be done in C++.

References
[1] Walter Scheirer, Anderson Rocha, Archana Sapkota, and Terrance Boult. Toward Open Set Recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:1757–72, 07 2013.
[2] Abhijit Bendale and Terrance E. Boult. Towards Open Set Deep Networks. CoRR, abs/1511.06233, 2015.
[3] Pedro Ribeiro Mendes Júnior, Jacques Wainer, and Anderson Rocha. Specialized Support Vector Machines
for Open-Set Recognition. CoRR, abs/1606.03802, 2016.
[4] Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X. Yu. Large-Scale Long-
Tailed Recognition in an Open World. In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2019.
[5] Zongyuan Ge and Sergey Demyanov and Zetao Chen and Rahil Garnavi. Generative OpenMax for Multi-
Class Open Set Classification. In British Machine Vision Conference Proceedings 2017.
[6] Ethan M. Rudd, Lalit P. Jain, Walter J. Scheirer, and Terrance E. Boult. The Extreme Value Machine. CoRR,
abs/1506.06112, 2015.
[7] Manuel Günther, Steve Cruz, Ethan M. Rudd, and Terrance E. Boult. Toward Open-Set Face Recognition.
CoRR, abs/1705.01567, 2017.
[8] Alexander Gepperth and Barbara Hammer. Incremental Learning Algorithms and Applications. In European
Symposium on Artificial Neural Networks (ESANN), 2016.

Early stage inﬂammatory musculoskeletal diseases classiﬁcation with deep learning

The Pattern Recognition Lab together with the medical clinic 3 (rheumatology and immunology) is offering the following master thesis:

„Early stage inﬂammatory musculoskeletal diseases classiﬁcation with deep learning“

Overview

Close collaboration with the clinic
Development of deep learning-based classification networks
Development of a neural network approach that combines clinical data and MRI images of the patients

Requirements

Strong background in implementing DL methods (Python)
Knowledge of MRI physics

For more information reach out to lukas.folle@fau.de