Index

Weakly Supervised Learning for Multi-modal Breast Lesion Classification in Ultrasound and Mammogram Images

Breast cancer has become one of the most common and leading types of cancer in women, which has taken death rate of 11.6 percent of the total cancer deaths worldwide. The mortality rate is increasing in recent years. It must be emphasized that early detection of a breast tumor can help to increase early treatment options that control the mortality rate among women. There are different diagnostic imaging modalities, which help doctors diagnose whether the patient is under the risk of possessing cancerous tumor.

Imaging modalities like ultrasound and mammogram are both used for screening of breast lesions. Mammogram, on the one hand, uses low radiation dose and takes an image of the breast as a 2-D image. Ultrasound, on the other hand, uses high frequency waves capturing an image of the breast as a 3-D image. Both modalities capture different useful information with their acquisition methods. Patients usually undergo diagnosis with mammography for initial lesion detection. But due to its low sensitivity, there are chances to miss detection of small tumors in heavy and dense breasts. Those patients that are highly suspected to the abnormalities are further diagnosed with ultrasound. Ultrasound images give more detailed information about the surrounding area of concern and hence also help radiologists investigate the vulnerability of the lesion.

The main aim of this thesis is to investigate the performance of deep learning models for classification of breast lesion using a dataset of ultrasound and mammogram images individually. Further, based on the evaluation of the performance results of these models, we would build a single deep learning model, which combine the information from both ultrasound and mammogram imaging modalities. An analysis of the performance of the fused and the individual models will also be performed.
The dataset which will be used to train the models consists of volumetric ultrasound images and 2-D mammogram images and is provided by University Clinics Erlangen. Weakly supervised approaches will be used with the classification labels defined at image level without further localisation. There are 468 patient files consisting of ultrasound and mammogram images of healthy and non-healthy patients. The latter can have either benign or malignant lesions.

Unsupervised Domain Adaptation using Adversarial Learning for Multi-model Cardiac MR Segmentation

Recently, numerous adversarial learning based domain adaptation methods for semantic segmentation have been proposed. For example, Vu et al. minimized entropy of the prediction and also introduced the entropy discriminator to discriminate the source entropy maps from the target entropy maps. In 2018, Tsai et al. found output space contains rich information thus, they proposed the output space discriminator. Both of the methods have achieved promising results in street scene segmentation, while for medical image segmentation, we can take advantage of the information in the shape of the organs. For instance, point clouds can be used to create 3D models to incorporate shape representation as prior information. Cai et al. introduced the organ point network. It takes deep learning features as input and generates the shape representation as a set of points located on the organ surface. They optimized the segmentation task with the point network as an auxiliary task so that the shared parameters could benefit from both tasks. They also proposed a point cloud discriminator to guide the model to capture the shape information better.

We aim to combine the ideas from the previous works and investigate the impact of output space and entropy discriminators for multi-modality cardiac image segmentation. We want to employ point cloud classification as an auxiliary task, and introduce a point cloud discriminator to discriminate the source point cloud from the target point cloud.

Marker Detection Using Deep Learning for Universal Navigation Interface

In the contemporary practice of medicine, minimally invasive spine surgery (MISS) is widely performed to avoid
the damage to the muscles surrounding the spine. Compared with traditional open surgeries, patients with MISS
suffer from less pain and can recover faster. For MISS, computer assisted navigation systems play an very important
role. Image guided navigation can deliver more accurate pedicle screw placement compared to conventional surgical
techniques. It also reduces the amount of X-ray exposure to surgeons and patients. In computer assisted navigation
for MISS, registration between preoperative images (typically 3D CT volumes) and intraoperative images (typically
2D fluoroscopic X-ray images) is usually a step of critical importance. To perform such registration, various markers
[1] are used. Such markers need to be identified in the preoperative CT volumes. In practice, due to the limited
detector size, the markers might be located outside the field-of-view of the imaging systems (typically C-arm or Oarm
systems) for large patients. Therefore, the markers are only acquired in projections of a certain view angles. As a
consequence, the reconstructed markers in the 3D CT volumes suffer from artifacts and have distorted shapes, which
cause difficulty for marker detection. In the scope of this master’s thesis, we aim to improve the image quality of CT
reconstructions from such truncated projections using deep learning [2, 3] so that a universal navigation interface is
able to detect markers without any vendor specific information. Alternatively, general marker detection directly in
X-ray projection images before 3D reconstruction using deep learning will also be investigated.

The thesis will include the following points:

 Literature review on deep learning CT truncation correction and deep learning marker detection;

 Simulation of CT data with various marker sizes and shapes;

 Implementation of our U-Net based deep learning method [3] with extension to high resolution reconstruction;

 Performance evaluation of our U-Net based deep learning method on the application of marker reconstruction;

 Investigation of deep learning methods on marker segmentation directly in 2D projections;

 Reconstruction of 3D markers based on segmented marker projections.

References
[1] S. Virk and S. Qureshi, “Navigation in minimally invasive spine surgery,” Journal of Spine Surgery, vol. 5,
no. Suppl 1, p. S25, 2019.
[2] ´ E. Fourni´e, M. Baer-Beck, and K. Stierstorfer, “CT field of view extension using combined channels extension
and deep learning methods,” in Proceedings of Medical Imaging with Deep Learning, 2019.
[3] Y. Huang, L. Gao, A. Preuhs, and A. Maier, “Field of view extension in computed tomography using deep learning
prior,” in Bildverarbeitung f¨ur die Medizin 2020, pp. 186–191, Springer, 2020.

Synthetic Image Rendering for Deep Learning License Plate Recognition

The recognition of license plates is usually considered a rather simple task, that a human
is perfectly capable of. However, there exist many factors (e.g. fog, rain), that can
signicantly worsen the image quality and therefore increase the diculty of recognizing
a license plate. In addition, further factors e.g. low resolution or small size of the license
plate section may increase the diculty up to a point, where even humans are unable to
identify it.
A possible approach to solve this problem is to build and train a neural network using
collected image data. In theory, this should yield a high success rate and outperform a
human. However, a huge number of images, that also fulll certain criteria, is needed in
order to reliably recognize plates in dierent situations.
That is the reason why this thesis aims at building and training a neural network, that is
based on an existing CNN [1], for recognizing license plates using training data, which is
articially created. This ensures enough images are provided, while facilitating the possibility
of adding image eects to simulate many possible situations. The needed images
can be created using Blender: It oers the option to create a 3D model of a license plate,
as well as options to simulate certain weather conditions like fog or rain, while also providing
an API to automate the creation process. This way, nearly all cases can be covered
and the described procedure maximizes the success rate of the license plate detection.

The thesis consists of the following steps:

ˆ Creating a training data set consisting of generated license plate images (Blender
Python API)

ˆ Fitting the parameters of the Deep Learning model

ˆ Evaluation of the model t on datasets with real license plate images

Literatur
[1] Benedikt Lorch, Shruti Agarwal, and Hany Farid. Forensic Reconstruction of Severely
Degraded License Plates. In Society for Imaging Science & Technology, editor,
Electronic Imaging, Jan 2019.

Representation learning strategies to model pathological speech: effect of multiple spectral resolutions

Description:
Speech signals contain paralinguistic information with specific cues about a given speaker including the
presence of diseases that may alter their communication capabilities. The automatic classification of
paralinguistic aspects has many potential applications, and has received a good deal of attention by the
research community [1-3]. In particular researchers look at clinical observations in the speech of patients
and try to objectively and automatically measure two main aspects of a given disease: (1) the presence of
a disease via classification of healthy control (HC) subjects and patients, and (2) the level of degradation
of the speech of patients according to a specific clinical scale [4]. These aspects are evaluated using
computer aided methods supported in signal processing and pattern recognition methods.
At the center of these computer aided methods and something that has been developed over the years to
continually improve the diagnosis and the assessment of severity of different pathological diseases is the
particular feature set and extraction method used [5-7]. Many recent studies focused on extracting
features for assessment of pathological speech rely on deep learning strategies [3].
In this project we consider one such approach that uses a parallel representation learning strategy to
model speech signals from patients with different speech disorders [8]. The model uses two types of
autoencoders, a convolutional autoencoder (CAE) and recurrent autoencoder (RAE). Both take as input
a spectrogram and output features derived from a hidden representation in the bottleneck space (i.e. a
compressed representation of the input). In addition, the reconstruction error of the autoencoder in
different spectral components of the speech signal is considered as a feature set.
The aim of this project is to evaluate the performance of the parallel representation learning strategy
using different parameterized representations of the spectrogram (e.g. comparing broadband and
narrowband spectral representations) as well as a wavelet representation to quantify the information
loss for each representation, and the benefit of using all of them together as a multiple input channel.
Methods for quantification include the overall ability of the proposed model to classify different
pathologies and the associated level of degradation of a given patient’s speech, and also comparing the
input and reconstructed speech signals using contours of phonological posteriors [9]. The aim is to
evaluate which group of phonemes are more affected due to the compression of the autoencoders using
the different spectral resolutions and their combinations.

References:

[1] Schuller, B., Batliner, A., 2013. Computational Paralinguistics: Emotion, Affect and Personality in
Speech and Language Processing. John Wiley & Sons.
[2] Schuller, B., et al., 2019. Affective and Behavioural Computing: Lessons Learnt from the First
Computational Paralinguistics Challenge. Computer Speech & Language 53, 156–180.
[3] Cummins, N., Baird, A., Schuller, B., 2018. Speech Analysis for Health: Current State-of-the-Art
and the Increasing Impact of Deep Learning Methods.
[4] Orozco-Arroyave, J.R., et al., 2015. Characterization Methods for the Detection of Multiple Voice
Disorders: Neurological, Functional, and Laryngeal Diseases. IEEE Journal of Biomedical and Health
Informatics 19, 1820–1828.
[5] Dimauro, G., Di-Nicola, V., et al., 2017. Assessment of Speech Intelligibility in Parkinson’s
Disease Using a Speech-to-Text System. IEEE Access 5, 22199–22208.
[6] Orozco-Arroyave, J.R., Vasquez-Correa, J.C., et al., 2016. Towards an Automatic Monitoring of
the Neurological State of the Parkinson’s Patients from Speech, in: IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 6490–6494.
[7] Schuller, B., Steidl, S., Batliner, A., Hantke, S., Hönig, F., Orozco-Arroyave, J.R., Nöth, E., Zhang,
Y., Weninger, F., 2015. The INTERSPEECH 2015 Computational Paralinguistics Challenge:
Nativeness, Parkinson’s & Eating Condition, in: Proceedings of INTERSPEECH, pp. 478–482.
[8] Vásquez-Correa, Juan Camilo et al. “Parallel Representation Learning for the Classification of
Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate” Under Review (2020).
[9] Vásquez-Correa, Juan Camilo et al. “Phonet: A Tool Based on Gated Recurrent Neural Networks to
Extract Phonological Posteriors from Speech.”, in: INTERSPEECH (2019).

Learning projection matrices for marker free motion compensation in weight-bearing CT scans

The integration of known operators into neural networks has recently received more and
more attention. The theoretical proof of its bene ts has been described by Maier and Syben
et al. in [1, 2]. Reducing the number of trainable weights by replacing trainable layers with
known operators reduces the overall approximation error and makes it easier to interpret
the layers function. This is of special interest in the context of medical imaging, where it is
crucial to understand the e ects of layers or operators on the resulting image. Several use
cases of know operators in medical imaging have been explored in the past few years [3][4][5].
An API to make such experiments easier is the PYRO-NN API by Syben et al. which comes
with several forward and backward projectors for di erent geometries as well as with helpers
such as lters [6].

Cone Beam CT (CBCT) imaging is a widely used X-Ray imaging technology which uses
a point source of X-rays and a 2D  at panel detector. Using an reconstruction algorithm
such as the FDK algorithm, a complete 3D reconstruction can be estimated using just one
rotation around the patient [7]. This modality is of great use in orthopedics were so called
weight bearing CT scans image primarily knee joints underweight bearing conditions to
picture the cartilage tissue under stress. The main drawback of this modality are motion
artifacts caused by involuntary movement of the patients knee and inaccuracies in the trajectory
of the scanner. In order to correct those artifacts, the extrinsic camera parameters,
which describe the position and orientation of the object relative to the detector have to be
adjusted [8].

To get one step closer to reduce motion artifacts without additional cameras or markers, it is
of special interest to study the feasibility of training extrinsic camera parameters as part of
a reconstruction pipeline. Before we can assess an algorithm to estimate those parameters,
the general feasibility of training the extrinsic camera parameters of a projection matrix
will be studied. The patients motion will be estimated iterative using a adapted gradient
descent algorithms, known from the training of neural networks.

The Bachelor’s thesis covers the following aspects:

1. Discussing of the general idea of motion compensation in CBCT as well as an quick
overview of the PYRO-NN API and thus into known Operators in general.

2. Study feasibility to learn a projection matrix of a single forward projection:
 .Assessing the ability to train single parameters
 Training of translations and rotations
 Attempt estimate the complete rigid motion parameters

3. Training of a simple trajectory:
 Assessing the motion estimation of the back projection using the volume as
ground truth
 Assessing the motion estimation using a undistorted sinogram
 Estimate the trajectory only based on the distorted sinogram

4. Evaluation of the training results of the experiments and description of potential applications
of the results.

All implementations will be integrated into the PYRO-NN API [6].

References
[1] A. Maier, F. Schebesch, C. Syben, T. Wur , S. Steidl, J. Choi, and R. Fahrig, \Precision
learning: Towards use of known operators in neural networks,” in 2018 24th International
Conference on Pattern Recognition (ICPR), pp. 183{188, 2018.

[2] A. K. Maier, C. Syben, B. Stimpel, T.Wur, M. Ho mann, F. Schebesch, W. Fu, L. Mill,
L. Kling, and S. Christiansen, \Learning with known operators reduces maximum error
bounds,” Nature machine intelligence, vol. 1, no. 8, pp. 373{380, 2019.

[3] W. Fu, K. Breininger, R. Scha ert, N. Ravikumar, T. Wur , J. G. Fujimoto, E. M.
Moult, and A. Maier, \Frangi-Net: A Neural Network Approach to Vessel Segmentation,”
in BildVerarbeitung fur die Medizin (BVM) 2018 (H. H. K. H. M.-H. C. P. T. T.
Andreas Maier, Thomas M. Deserno, ed.), (Berlin, Heidelberg), pp. 341{346, Springer
Vieweg, Berlin, Heidelberg, 2018.

[4] C. Syben, B. Stimpel, K. Breininger, T. Wur , R. Fahrig, A. Dorer, and A. Maier,
\Precision Learning: Reconstruction Filter Kernel Discretization,” in Proceedings of
the 5th International Conference on Image Formation in X-ray Computed Tomography,
pp. 386{390, 2018. UnivIS-Import:2018-09-11:Pub.2018.tech.IMMD.IMMD5.precis 0.

[5] T. Wur , F. C. Ghesu, V. Christlein, and A. Maier, \Deep learning computed tomography,”
in Medical Image Computing and Computer-Assisted Intervention – MICCAI
2016 (S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells, eds.), (Cham),
pp. 432{440, Springer International Publishing, 2016.

[6] C. Syben, M. Michen, B. Stimpel, S. Seitz, S. Ploner, and A. K. Maier, \Technical note:
Pyro-nn: Python reconstruction operators in neural networks,” Medical Physics, 2019.

[7] L. Feldkamp, L. C. Davis, and J. Kress, \Practical cone-beam algorithm,” J. Opt. Soc.
Am, vol. 1, pp. 612{619, 01 1984.

[8] J. Maier, M. Nitschke, J.-H. Choi, G. Gold, R. Fahrig, B. M. Esko er, and A. Maier,
\Inertial measurements for motion compensation in weight-bearing cone-beam ct of the
knee,” 2020.

Implementation and Evaluation of Cluster-based Self-supervised Learning Methods

Prototypical Contrastive Learning (PCL) [1] is a new unsupervised representation learning method
which unifies the two directions of unsupervised learning: clustering and contrastive learning. This
method can train deep neural networks from millions of unlabeled images. Conventional contrastive
learning was instance-based instead of prototype-based. The authors introduced prototypes as the
cluster centers of similar images. The training setup works in an EM-like scheme: Find the distribution
of prototypes by clustering in step E; optimize network by performing contrastive learning in the M step.
Additionally, they proposed the ProtoNCE loss, which generalizes the commonly used InfoNCE loss.
With this method, the authors report over 10% performance improvement across multiple benchmarks.
The clustering of the PCL is computed by k-means. However, this clustering may deteroriate over time,
causing problems, such as classifying all samples into the same category. The solution proposed by
Asano and Rupprecht [2] is to add a constraint, the labels must be equally distributed to all samples,
that is, to maximize the information between the indicators and labels of the sample. The problem
of label assignment is equivalent to optimal transport. In order to expand to millions of samples and
thousands of categories, a fast version of the Sinkhorn-Knopp algorithm is used to find an approxi-
mate solution. In summary, they replace the k-means in DeepCluster [3] with the Sinkhorn-Knopp
algorithm to approximate the label assignment Q, and then use cross-entropy to learn the representation.
In this work, the k-means clustering of the PCL shall be replaced by the Sinkhorn-Knopp algo-
rithm and be thoroughly evaluated on multiple datasets.
The thesis consists of the following milestones:
• Literature study on self-supervised learning [4][5]
• Implementation of [1] and [2]
• Implementation of the combination of PCL and Self-labelling with Optimal Transport
• Thorough evaluation using different datasets and compare it with PCL and Self-labeling
• Comparison with other self-supervised learning papers
• Further experiments regarding learning procedure and network architecure
The implementation should be done in Python, PyTorch.

References
[1] Junnan Li, Pan Zhou, Caiming Xiong, Richard Socher, and Steven C. H. Hoi. Prototypical Contrastive
Learning of Unsupervised Representations. arXiv:2005.04966 [cs], July 2020. arXiv: 2005.04966.
[2] Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering
and representation learning. arXiv:1911.05371 [cs], February 2020. arXiv: 1911.05371.
[3] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep Clustering for Unsupervised
Learning of Visual Features. In Computer Vision – ECCV 2018, volume 11218. Cham: Springer International
Publishing, 2018.
[4] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A Simple Framework for
Contrastive Learning of Visual Representations. arXiv:2002.05709 [cs, stat], June 2020. arXiv: 2002.05709.
[5] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum Contrast for Unsupervised
Visual Representation Learning. arXiv e-prints, arXiv:1911.05722 (Nov. 2019). arXiv: 1911.05722 [cs.CV]].

 

Clustering of HPC jobs using Unsupervised Machine Learning on job performance metric time series data

Optimization of the Input Resolution for Dermoscopy Image Classification Tasks

Towards Efficient Incremental Extreme Value Theory Algorithms for Open World Recognition

Classification problems that evolve over time require classifiers that can adapt to previously unseen
classes and also to a change in the class over time. This is called “open world classification“ and is an
already well established research topic [1, 2, 3, 4, 5]. One good example for this is face recognition. A
classifier for this task should recognize previously unseen persons and classify them in another class.
But also the face of a person might change over time, that is where incremental updates would be
better suited instead of retraining the complete models. While there are many classifiers performing
this static task, it is not straightforward to transfer those into an efficient incremental update algorithm
[6]. The Extreme Value Machine (EVM) [6] is an open set classifier, which can perform incremental
learning and is based on a strong theoretical foundation. It uses extreme value theory based calibrations
to fit Weibull-distributions around each sample. Compared to a classification via a similarity function
and thresholds, the EVM leads to naturally softer and more robust bounds. It is also less sensitive to
noise and performs well, especially in open space [7].
However the Extreme Value Machine can be used incrementally, there is no efficient update mechanism
provided. Transferring an algorithm to incremental updates can have several direct problems like
finding an efficient update function, or a limitation in space. One cannot save all previously seen
samples. Therefore a model reduction algorithm, which approximates the set cover algorithm, is
proposed in the EVM. Yet there are also several indirect problems, like a concept drift, that is when
the samples for a class change, either gradually or abruptly. The model is able to adapt to it, but with
it comes another challenge the so called “stability-plasticity dilemma“ [8]. This means the weigh up
between a rather fast or slow adaption to change in one class. A fast adaption can result in an unwanted
adaption to noise. Yet a slow adaption can lead to missing cyclical or constantly gradual change in the
data. Also the model reduction used in the EVM can lead to unbalanced classes and in the extreme
case to a nearly complete removal of some classes. This is called “catastrophic forgetting“ [8]. These
problems are not directly solved by the EVM and should also be assessed in this work.
In this thesis, the EVM will be extended by incremental update mechanisms. Starting with an exact
naive approach, later approximative algorithms will be tested for both efficiency and accuracy. One of
the main problems in terms of efficiency is that when adding new samples to the EVM, all previously
trained points need to be updated according to the naive EVM algorithm. Also when updating an
existing sample it needs to be compared to all the samples of all other classes. The first optimizations
are based on these two properties. First not all previously trained samples have to be updated, finding
the ones, on sample and class base, is the challenge. Second not all feature-points have to be compared
to while (re-)training another feature-point. Later those variants will be evaluated on different datasets.
One important one will be a face dataset for the above stated, given challenges.
The thesis consists of the following milestones:
• Implementation of the exact incremental update algorithm.

• Evaluating performance on MNIST, ImageNet, Face Dataset.

• Expanding of approximative update algorithms.

• Further experiments regarding learning and forgetting procedure.

The implementation should be done in C++.

References
[1] Walter Scheirer, Anderson Rocha, Archana Sapkota, and Terrance Boult. Toward Open Set Recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:1757–72, 07 2013.
[2] Abhijit Bendale and Terrance E. Boult. Towards Open Set Deep Networks. CoRR, abs/1511.06233, 2015.
[3] Pedro Ribeiro Mendes Júnior, Jacques Wainer, and Anderson Rocha. Specialized Support Vector Machines
for Open-Set Recognition. CoRR, abs/1606.03802, 2016.
[4] Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X. Yu. Large-Scale Long-
Tailed Recognition in an Open World. In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2019.
[5] Zongyuan Ge and Sergey Demyanov and Zetao Chen and Rahil Garnavi. Generative OpenMax for Multi-
Class Open Set Classification. In British Machine Vision Conference Proceedings 2017.
[6] Ethan M. Rudd, Lalit P. Jain, Walter J. Scheirer, and Terrance E. Boult. The Extreme Value Machine. CoRR,
abs/1506.06112, 2015.
[7] Manuel Günther, Steve Cruz, Ethan M. Rudd, and Terrance E. Boult. Toward Open-Set Face Recognition.
CoRR, abs/1705.01567, 2017.
[8] Alexander Gepperth and Barbara Hammer. Incremental Learning Algorithms and Applications. In European
Symposium on Artificial Neural Networks (ESANN), 2016.