Index

Representation learning strategies to model pathological speech: effect of multiple spectral resolutions

Description:
Speech signals contain paralinguistic information with specific cues about a given speaker including the
presence of diseases that may alter their communication capabilities. The automatic classification of
paralinguistic aspects has many potential applications, and has received a good deal of attention by the
research community [1-3]. In particular researchers look at clinical observations in the speech of patients
and try to objectively and automatically measure two main aspects of a given disease: (1) the presence of
a disease via classification of healthy control (HC) subjects and patients, and (2) the level of degradation
of the speech of patients according to a specific clinical scale [4]. These aspects are evaluated using
computer aided methods supported in signal processing and pattern recognition methods.
At the center of these computer aided methods and something that has been developed over the years to
continually improve the diagnosis and the assessment of severity of different pathological diseases is the
particular feature set and extraction method used [5-7]. Many recent studies focused on extracting
features for assessment of pathological speech rely on deep learning strategies [3].
In this project we consider one such approach that uses a parallel representation learning strategy to
model speech signals from patients with different speech disorders [8]. The model uses two types of
autoencoders, a convolutional autoencoder (CAE) and recurrent autoencoder (RAE). Both take as input
a spectrogram and output features derived from a hidden representation in the bottleneck space (i.e. a
compressed representation of the input). In addition, the reconstruction error of the autoencoder in
different spectral components of the speech signal is considered as a feature set.
The aim of this project is to evaluate the performance of the parallel representation learning strategy
using different parameterized representations of the spectrogram (e.g. comparing broadband and
narrowband spectral representations) as well as a wavelet representation to quantify the information
loss for each representation, and the benefit of using all of them together as a multiple input channel.
Methods for quantification include the overall ability of the proposed model to classify different
pathologies and the associated level of degradation of a given patient’s speech, and also comparing the
input and reconstructed speech signals using contours of phonological posteriors [9]. The aim is to
evaluate which group of phonemes are more affected due to the compression of the autoencoders using
the different spectral resolutions and their combinations.

References:

[1] Schuller, B., Batliner, A., 2013. Computational Paralinguistics: Emotion, Affect and Personality in
Speech and Language Processing. John Wiley & Sons.
[2] Schuller, B., et al., 2019. Affective and Behavioural Computing: Lessons Learnt from the First
Computational Paralinguistics Challenge. Computer Speech & Language 53, 156–180.
[3] Cummins, N., Baird, A., Schuller, B., 2018. Speech Analysis for Health: Current State-of-the-Art
and the Increasing Impact of Deep Learning Methods.
[4] Orozco-Arroyave, J.R., et al., 2015. Characterization Methods for the Detection of Multiple Voice
Disorders: Neurological, Functional, and Laryngeal Diseases. IEEE Journal of Biomedical and Health
Informatics 19, 1820–1828.
[5] Dimauro, G., Di-Nicola, V., et al., 2017. Assessment of Speech Intelligibility in Parkinson’s
Disease Using a Speech-to-Text System. IEEE Access 5, 22199–22208.
[6] Orozco-Arroyave, J.R., Vasquez-Correa, J.C., et al., 2016. Towards an Automatic Monitoring of
the Neurological State of the Parkinson’s Patients from Speech, in: IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 6490–6494.
[7] Schuller, B., Steidl, S., Batliner, A., Hantke, S., Hönig, F., Orozco-Arroyave, J.R., Nöth, E., Zhang,
Y., Weninger, F., 2015. The INTERSPEECH 2015 Computational Paralinguistics Challenge:
Nativeness, Parkinson’s & Eating Condition, in: Proceedings of INTERSPEECH, pp. 478–482.
[8] Vásquez-Correa, Juan Camilo et al. “Parallel Representation Learning for the Classification of
Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate” Under Review (2020).
[9] Vásquez-Correa, Juan Camilo et al. “Phonet: A Tool Based on Gated Recurrent Neural Networks to
Extract Phonological Posteriors from Speech.”, in: INTERSPEECH (2019).

Learning projection matrices for marker free motion compensation in weight-bearing CT scans

The integration of known operators into neural networks has recently received more and
more attention. The theoretical proof of its benets has been described by Maier and Syben
et al. in [1, 2]. Reducing the number of trainable weights by replacing trainable layers with
known operators reduces the overall approximation error and makes it easier to interpret
the layers function. This is of special interest in the context of medical imaging, where it is
crucial to understand the eects of layers or operators on the resulting image. Several use
cases of know operators in medical imaging have been explored in the past few years [3][4][5].
An API to make such experiments easier is the PYRO-NN API by Syben et al. which comes
with several forward and backward projectors for dierent geometries as well as with helpers
such as lters [6].

Cone Beam CT (CBCT) imaging is a widely used X-Ray imaging technology which uses
a point source of X-rays and a 2D at panel detector. Using an reconstruction algorithm
such as the FDK algorithm, a complete 3D reconstruction can be estimated using just one
rotation around the patient [7]. This modality is of great use in orthopedics were so called
weight bearing CT scans image primarily knee joints underweight bearing conditions to
picture the cartilage tissue under stress. The main drawback of this modality are motion
artifacts caused by involuntary movement of the patients knee and inaccuracies in the trajectory
of the scanner. In order to correct those artifacts, the extrinsic camera parameters,
which describe the position and orientation of the object relative to the detector have to be
adjusted [8].

To get one step closer to reduce motion artifacts without additional cameras or markers, it is
of special interest to study the feasibility of training extrinsic camera parameters as part of
a reconstruction pipeline. Before we can assess an algorithm to estimate those parameters,
the general feasibility of training the extrinsic camera parameters of a projection matrix
will be studied. The patients motion will be estimated iterative using a adapted gradient
descent algorithms, known from the training of neural networks.

The Bachelor’s thesis covers the following aspects:

1. Discussing of the general idea of motion compensation in CBCT as well as an quick
overview of the PYRO-NN API and thus into known Operators in general.

2. Study feasibility to learn a projection matrix of a single forward projection:
.Assessing the ability to train single parameters
Training of translations and rotations
Attempt estimate the complete rigid motion parameters

3. Training of a simple trajectory:
Assessing the motion estimation of the back projection using the volume as
ground truth
Assessing the motion estimation using a undistorted sinogram
Estimate the trajectory only based on the distorted sinogram

4. Evaluation of the training results of the experiments and description of potential applications
of the results.

All implementations will be integrated into the PYRO-NN API [6].

References
[1] A. Maier, F. Schebesch, C. Syben, T. Wur , S. Steidl, J. Choi, and R. Fahrig, \Precision
learning: Towards use of known operators in neural networks,” in 2018 24th International
Conference on Pattern Recognition (ICPR), pp. 183{188, 2018.

[2] A. K. Maier, C. Syben, B. Stimpel, T.Wur, M. Homann, F. Schebesch, W. Fu, L. Mill,
L. Kling, and S. Christiansen, \Learning with known operators reduces maximum error
bounds,” Nature machine intelligence, vol. 1, no. 8, pp. 373{380, 2019.

[3] W. Fu, K. Breininger, R. Schaert, N. Ravikumar, T. Wur , J. G. Fujimoto, E. M.
Moult, and A. Maier, \Frangi-Net: A Neural Network Approach to Vessel Segmentation,”
in BildVerarbeitung fur die Medizin (BVM) 2018 (H. H. K. H. M.-H. C. P. T. T.
Andreas Maier, Thomas M. Deserno, ed.), (Berlin, Heidelberg), pp. 341{346, Springer
Vieweg, Berlin, Heidelberg, 2018.

[4] C. Syben, B. Stimpel, K. Breininger, T. Wur , R. Fahrig, A. Dorer, and A. Maier,
\Precision Learning: Reconstruction Filter Kernel Discretization,” in Proceedings of
the 5th International Conference on Image Formation in X-ray Computed Tomography,
pp. 386{390, 2018. UnivIS-Import:2018-09-11:Pub.2018.tech.IMMD.IMMD5.precis 0.

[5] T. Wur , F. C. Ghesu, V. Christlein, and A. Maier, \Deep learning computed tomography,”
in Medical Image Computing and Computer-Assisted Intervention – MICCAI
2016 (S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells, eds.), (Cham),
pp. 432{440, Springer International Publishing, 2016.

[6] C. Syben, M. Michen, B. Stimpel, S. Seitz, S. Ploner, and A. K. Maier, \Technical note:
Pyro-nn: Python reconstruction operators in neural networks,” Medical Physics, 2019.

[7] L. Feldkamp, L. C. Davis, and J. Kress, \Practical cone-beam algorithm,” J. Opt. Soc.
Am, vol. 1, pp. 612{619, 01 1984.

[8] J. Maier, M. Nitschke, J.-H. Choi, G. Gold, R. Fahrig, B. M. Eskoer, and A. Maier,
\Inertial measurements for motion compensation in weight-bearing cone-beam ct of the
knee,” 2020.

Implementation and Evaluation of Cluster-based Self-supervised Learning Methods

Prototypical Contrastive Learning (PCL) [1] is a new unsupervised representation learning method
which unifies the two directions of unsupervised learning: clustering and contrastive learning. This
method can train deep neural networks from millions of unlabeled images. Conventional contrastive
learning was instance-based instead of prototype-based. The authors introduced prototypes as the
cluster centers of similar images. The training setup works in an EM-like scheme: Find the distribution
of prototypes by clustering in step E; optimize network by performing contrastive learning in the M step.
Additionally, they proposed the ProtoNCE loss, which generalizes the commonly used InfoNCE loss.
With this method, the authors report over 10% performance improvement across multiple benchmarks.
The clustering of the PCL is computed by k-means. However, this clustering may deteroriate over time,
causing problems, such as classifying all samples into the same category. The solution proposed by
Asano and Rupprecht [2] is to add a constraint, the labels must be equally distributed to all samples,
that is, to maximize the information between the indicators and labels of the sample. The problem
of label assignment is equivalent to optimal transport. In order to expand to millions of samples and
thousands of categories, a fast version of the Sinkhorn-Knopp algorithm is used to find an approxi-
mate solution. In summary, they replace the k-means in DeepCluster [3] with the Sinkhorn-Knopp
algorithm to approximate the label assignment Q, and then use cross-entropy to learn the representation.
In this work, the k-means clustering of the PCL shall be replaced by the Sinkhorn-Knopp algo-
rithm and be thoroughly evaluated on multiple datasets.
The thesis consists of the following milestones:
• Literature study on self-supervised learning [4][5]
• Implementation of [1] and [2]
• Implementation of the combination of PCL and Self-labelling with Optimal Transport
• Thorough evaluation using different datasets and compare it with PCL and Self-labeling
• Comparison with other self-supervised learning papers
• Further experiments regarding learning procedure and network architecure
The implementation should be done in Python, PyTorch.

References
[1] Junnan Li, Pan Zhou, Caiming Xiong, Richard Socher, and Steven C. H. Hoi. Prototypical Contrastive
Learning of Unsupervised Representations. arXiv:2005.04966 [cs], July 2020. arXiv: 2005.04966.
[2] Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering
and representation learning. arXiv:1911.05371 [cs], February 2020. arXiv: 1911.05371.
[3] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep Clustering for Unsupervised
Learning of Visual Features. In Computer Vision – ECCV 2018, volume 11218. Cham: Springer International
Publishing, 2018.
[4] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A Simple Framework for
Contrastive Learning of Visual Representations. arXiv:2002.05709 [cs, stat], June 2020. arXiv: 2002.05709.
[5] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum Contrast for Unsupervised
Visual Representation Learning. arXiv e-prints, arXiv:1911.05722 (Nov. 2019). arXiv: 1911.05722 [cs.CV]].

Clustering of HPC jobs using Unsupervised Machine Learning on job performance metric time series data

Optimization of the Input Resolution for Dermoscopy Image Classification Tasks

Towards Efficient Incremental Extreme Value Theory Algorithms for Open World Recognition

Classification problems that evolve over time require classifiers that can adapt to previously unseen
classes and also to a change in the class over time. This is called “open world classification“ and is an
already well established research topic [1, 2, 3, 4, 5]. One good example for this is face recognition. A
classifier for this task should recognize previously unseen persons and classify them in another class.
But also the face of a person might change over time, that is where incremental updates would be
better suited instead of retraining the complete models. While there are many classifiers performing
this static task, it is not straightforward to transfer those into an efficient incremental update algorithm
[6]. The Extreme Value Machine (EVM) [6] is an open set classifier, which can perform incremental
learning and is based on a strong theoretical foundation. It uses extreme value theory based calibrations
to fit Weibull-distributions around each sample. Compared to a classification via a similarity function
and thresholds, the EVM leads to naturally softer and more robust bounds. It is also less sensitive to
noise and performs well, especially in open space [7].
However the Extreme Value Machine can be used incrementally, there is no efficient update mechanism
provided. Transferring an algorithm to incremental updates can have several direct problems like
finding an efficient update function, or a limitation in space. One cannot save all previously seen
samples. Therefore a model reduction algorithm, which approximates the set cover algorithm, is
proposed in the EVM. Yet there are also several indirect problems, like a concept drift, that is when
the samples for a class change, either gradually or abruptly. The model is able to adapt to it, but with
it comes another challenge the so called “stability-plasticity dilemma“ [8]. This means the weigh up
between a rather fast or slow adaption to change in one class. A fast adaption can result in an unwanted
adaption to noise. Yet a slow adaption can lead to missing cyclical or constantly gradual change in the
data. Also the model reduction used in the EVM can lead to unbalanced classes and in the extreme
case to a nearly complete removal of some classes. This is called “catastrophic forgetting“ [8]. These
problems are not directly solved by the EVM and should also be assessed in this work.
In this thesis, the EVM will be extended by incremental update mechanisms. Starting with an exact
naive approach, later approximative algorithms will be tested for both efficiency and accuracy. One of
the main problems in terms of efficiency is that when adding new samples to the EVM, all previously
trained points need to be updated according to the naive EVM algorithm. Also when updating an
existing sample it needs to be compared to all the samples of all other classes. The first optimizations
are based on these two properties. First not all previously trained samples have to be updated, finding
the ones, on sample and class base, is the challenge. Second not all feature-points have to be compared
to while (re-)training another feature-point. Later those variants will be evaluated on different datasets.
One important one will be a face dataset for the above stated, given challenges.
The thesis consists of the following milestones:
• Implementation of the exact incremental update algorithm.

• Evaluating performance on MNIST, ImageNet, Face Dataset.

• Expanding of approximative update algorithms.

• Further experiments regarding learning and forgetting procedure.

The implementation should be done in C++.

References
[1] Walter Scheirer, Anderson Rocha, Archana Sapkota, and Terrance Boult. Toward Open Set Recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:1757–72, 07 2013.
[2] Abhijit Bendale and Terrance E. Boult. Towards Open Set Deep Networks. CoRR, abs/1511.06233, 2015.
[3] Pedro Ribeiro Mendes Júnior, Jacques Wainer, and Anderson Rocha. Specialized Support Vector Machines
for Open-Set Recognition. CoRR, abs/1606.03802, 2016.
[4] Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X. Yu. Large-Scale Long-
Tailed Recognition in an Open World. In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2019.
[5] Zongyuan Ge and Sergey Demyanov and Zetao Chen and Rahil Garnavi. Generative OpenMax for Multi-
Class Open Set Classification. In British Machine Vision Conference Proceedings 2017.
[6] Ethan M. Rudd, Lalit P. Jain, Walter J. Scheirer, and Terrance E. Boult. The Extreme Value Machine. CoRR,
abs/1506.06112, 2015.
[7] Manuel Günther, Steve Cruz, Ethan M. Rudd, and Terrance E. Boult. Toward Open-Set Face Recognition.
CoRR, abs/1705.01567, 2017.
[8] Alexander Gepperth and Barbara Hammer. Incremental Learning Algorithms and Applications. In European
Symposium on Artificial Neural Networks (ESANN), 2016.

Deep Learning-based Matching of Chest X-Ray Scans

The use of human identification has become an increasingly important factor over the past years, with
facial recognition being potentially the most common form used in daily life. But the face is not the
only biometric identifier that can be used as a feature for identification. In this work, we will investigate
chest X-rays as biometric identifiers. If they were proven to be viable, it would for example allow
identification post mortem, where common techniques currently have shortcomings [1]. Also, a success
in such a way of identification may have far-reaching consequences and implications concerning data
protection and anonymity in the medical field.
In pattern recognition, the use of deep learning has proven to be successful in improving or even
replacing classical methods entirely. To test the limits of what is currently possible, a neural network
will be created that takes in two different x-ray scans as inputs and outputs a score measuring their
similarity.
To increase the chances of success, a registration step will be incorporated in the preprocessing step. It
will be be implemented as a neural network layer, as this has proven to be effective in the past [2].
The thesis consists of the following milestones:
• Testing out the capabilities of different network architectures concerning the task of finding
matches in chest X-Ray scans
• Further enhancing the functionality by incorporating a layer into the network that is capable of
affine registrations, e. g. by means of a spatial transformer network [3]
The implementation should be done in Python.

References
[1] Ryudo Ishigami, Thi Thi Zin, Norihiro Shinkawa, and Ryuichi Nishii. Human identification using x-ray
image matching. In Proceedings of The International MultiConference of Engineers and Computer Scientists
2017, volume 1, pages 415–418, 2017.
[2] Grant Haskins, Uwe Kruger, and Pingkun Yan. Deep learning in medical image registration: a survey.
Machine Vision and Applications, 31(1–2), Jan 2020.
[3] Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks.
In Advances in Neural Information Processing Systems 28, pages 2017–2025. Curran Associates, Inc., 2015.

Analysis of NVIDIA Optix Engine for Ray Tracing in SPECT

Looking for a student for the project: Analysis of NVIDIA Optix as a Ray Tracing platform for SPECT forward projection

Topic motivation

Ray tracing is massively used in videogames to determine what object within the scene should be shown in the viewpoint of the observer.
Furthermore Ray Tracing is also used to determine the shadows, lights and reflections to be portrayed in the screen.
Optix is an extremely powerful API designed by NVIDIA due to its modularity and its flexibility. In 2015, Optix was used to model a SPECT system, achieving a significant speed up over other simulation frameworks for the same task [1]

Project description

The project would consist of five parts:

Part I: Set up Optix as a ray tracing framework for nuclear imaging, without physics
Part II: Run a simulation with a simple SPECT parallel hole collimator
Part III: Set up Optix as a ray tracing framework for nuclear imaging, with physics
Part IV: Validation of the tool with simulated data from SIMIND (data provided)
Part V: Validation of the tool with data acquired from a system (data provided)

Success measurements:

Project would be considered successful after step II
At step IV, it could become a conference paper.

Other information:

Topic can be 5 or 10 ECTS Research/Master Project. Can also be extended to a thesis.
Contact: maximilian.reymann@fau.de
Applicants ideally have experience with C++ or GPU programming, or are looking to gain expertise in these areas.

GAN Generated Model Observer for one Class Detection in SPECT Imaging

Early stage inﬂammatory musculoskeletal diseases classiﬁcation with deep learning

The Pattern Recognition Lab together with the medical clinic 3 (rheumatology and immunology) is offering the following master thesis:

„Early stage inﬂammatory musculoskeletal diseases classiﬁcation with deep learning“

Overview

Close collaboration with the clinic
Development of deep learning-based classification networks
Development of a neural network approach that combines clinical data and MRI images of the patients

Requirements

Strong background in implementing DL methods (Python)
Knowledge of MRI physics

For more information reach out to lukas.folle@fau.de