Index

Development and Evaluation of an Transformer-based Deep Learning Model for 12-lead ECG Classification

In the field of natural language processing transformer networks, which dispense with recurrent
architectures by using scaled dot-product attention mechanism [1], became state of the art for
many tasks. Due to its huge success, transformers also have been applied in other fields of research
such as music generation or computer vision [2, 3].
For electrocardiogram (ECG) classification convolutional neural networks (CNNs) or recurrent
neural networks (RNNs) are still widely used. Combining a CNN as a feature extractor with
transformer encoders instead of an RNN lately has shown to be potentially competitive with
existing architectures [4]. As transformer layers rely on attention feature maps that can be visualized
easily they could help to improve the interpretability of decisions made by the deep learning
model, which is in particular important in medical and health care applications.
In image classification a recent work proposes that transformers could even replace convolutions
and outperform deep residual models [3]. Therefore the goal of this work is to develop an algorithm
for 12-lead ECG classification with transformer encoder layers as a crucial part of the feature extractor
and evaluate its performance, in particular concerning different types of cardiac abnormalities.
Furthermore, it is to be investigated, if the model learns to compute human-comprehensible
attention feature maps.
The work consists of the following parts:
• Literature research on existing deep learning models for ECG signal classification and arrhythmia
detection.
• Adapt a transformer architecture for 12-lead ECG classification
• Training and evaluation of the model on PTB-XL [5] and ICBEB challenge 2018 [6] data
set
• Comparison based on the ROC-AUC score with a transformer-based reference implementation
[4] and existing models that were benchmarked on PTB-XL [7]
• Assessment of advantages/disadvantages in the classification of different types of cardiac abnormalities,
at morphological and rhythm level in particular, and visualization of attention
maps.

References
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,
Lukasz Kaiser, and Illia Polosukhin. Attention is all you need.
[2] Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis
Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck.
Music transformer.
[3] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai,
Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly,
Jakob Uszkoreit, and Neil Houlsby. An image is worth 16×16 words: Transformers for image
recognition at scale.
[4] Annamalai Natarajan, Yale Chang, Sara Mariani, Asif Rahman, Gregory Boverman, Shruti
Vij, and Jonathan Rubin. A wide and deep transformer neural network for 12-lead ecg classification.
In 2020 Computing in Cardiology Conference (CinC), Computing in Cardiology
Conference (CinC). Computing in Cardiology, 2020.
[5] PatrickWagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I. Lunze,Wojciech
Samek, and Tobias Schaeffter. PTB-XL, a large publicly available electrocardiography
dataset. Scientific Data, 7(1):154, 2020.
[6] Feifei Liu, Chengyu Liu, Lina Zhao, Xiangyu Zhang, Xiaoling Wu, Xiaoyan Xu, Yulin Liu,
Caiyun Ma, Shoushui Wei, Zhiqiang He, Jianqing Li, and Eddie Ng Yin Kwee. An Open
Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology
Abnormality Detection. Journal of Medical Imaging and Health Informatics, 8(7):1368–1373,
September 2018.
[7] Nils Strodthoff, Patrick Wagner, Tobias Schaeffter, and Wojciech Samek. Deep learning for
ECG analysis: Benchmarks and insights from PTB-XL. IEEE Journal of Biomedical and
Health Informatics, 25(5):1519–1528, 2021.

Autoencoding CEST MRI Spectra

Improvement of Patient Specific SPECT and PET Brain Perfusion Phantoms for Assessment of Partial Volume Effect

Deep Orca Image Denoising Using Machine-Generated Binary Killer Whale Masks

Introduction
The following proposes the use of binary masks as a ground truth for the
denoising of deep learning based killer whale classication. It is part of a
project of the Pattern Recognition Lab of the FAU in cooperation with the
Vancouver Maritime Institute. Based on thousands of images of killer whale
populations taken over the last years, a deep learning approach was used
to ease the classication of individual animals for local researchers, both
visual and with call recognition [2]. Previous work focused on the extraction
of regions of interest from the original images and classication of single
animals. To limit the in
uence of noise on the classication, this thesis aims
to create binary masks of the animals from image segmentation. Binary
masks often present an accurate ground truth for deep learning approaches.
The following work is therefore closely related to [4]. It is part of the visual
counterpart of the existing “Denoising Kit” for audio signals of killer whales.
Motivation
Noise plays a crucial role in the detection and reconstruction of images.
In this case close color spaces and partially blurry images throughout the
extracted data limit the success of deep learning based classication. With
a binary mask of the orca body as a ground truth, a network can be trained
1
without the in
uence of noise. This can further increase the accuracy of orca
detection, helping researchers to track animal populations much easier.
Approach
Two approaches have presented themselves to be most ecient and are going
to be utilized. First, common methods are used to detect edges in the orca
images. This will be done with the popular Canny Edge Algorithm. The
images are also processed by a superpixel segmentation algorithm [5]. By
overlaying both results, an accurate outline of the animals shape can be
segmented. After a binarization, the resulting mask will be used as a ground
truth for a deep learning network. With it the original images are denoised
to account for a better classication later.
Finally this thesis will look into intelligent data augmentation in the form
of image morphing techniques, utilizing the created binary masks. With
feature-based image morphing [1], the variety of training data and therefore
also the accuracy of the underlying classier could be further improved.
Medical application
Ground truth binary masks can and in some parts already have application
in computer vision tasks in the medical eld. Deep learning classication of
tumors in CT and MRI images are often based on binary masks, traced by
radiologists [3]. Similar issues regarding noise are often faced.
References
[1] Thaddeus Beier and Shawn Neely. Feature-based image metamorphosis.
ACM SIGGRAPH computer graphics, 26(2):35{42, 1992.
[2] Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas K Maier,
Volker Barth, and Elmar Noth. Deep learning for orca call type
identication-a fully unsupervised approach. In INTERSPEECH, pages
3357{3361, 2019.
[3] Francisco Javier Daz-Pernas, Mario Martnez-Zarzuela, Mriam Anton-
Rodrguez, and David Gonzalez-Ortega. A deep learning approach for
2
brain tumor classication and segmentation using a multiscale convolutional
neural network. In Healthcare, volume 9, page 153. Multidisciplinary
Digital Publishing Institute, 2021.
[4] Christian Bergler et. al. Orca-clean: A deep denoising toolkit for killer
whale communication. INTERSPEECH, 2020.
[5] Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based
image segmentation. International journal of computer vision, 59(2):167{
181, 2004.

Web-Based Server-Client Software Framework for Killer Whale Indivirual Recognition

1. Motivation
In the last decades, more and more documentation and data storage for research purposes was done with computers, enabling the use of algorithms to work on the data. In the 1970s, it was discovered that individual killer whales, like orcas, can be identified by their natural markings, like their fin. Re-searchers have taken pictures of killer whales to identify and document them during the last years. Each discovered orca gets a unique ID, which will be referred as “label” in the further context. This identification process is currently done manually by researchers for each picture. For his master’s the-sis “Orca Individual Identification based on Image Classification Using Deep Learning”, Alexander Gebhard developed a machine-learning pipeline that can identify individual orcas on images. If the pipeline is given a picture without labels, it will return the most likely labels for this picture. If it is given a picture with a label, then the labels and the image will be used to train the pipeline so that it can identify individual orcas better. The goal of my bachelor’s thesis is to develop and document a user-friendly web-based server-client framework (here referred as FIN-PRINT) that allows researchers to synchronize images and labels with the pipeline and to create as well as to store labels and relevant meta-data.
2. Concept
2.1 Description
FIN-PRINT is a platform-independent framework that can run on different operating systems, like Win-dows, Mac OS and Linux. Users of this framework can interact with it over several webpages that are hosted locally on their computer. The interface allows the user to browse through images of orcas on their local hard-drive and to add labels to these images. They can also check automatically generated labels from the machine-learning pipeline and make corrections to the labels if necessary. Labeling new images or checking the labels from the pipeline can be done offline without an internet connec-tion if the necessary files are present. If the user has an active internet connection, they can also ex-plore statistics from a database which stores relevant data about the images and their labels, like GPS-data, the dates where an individual orca was spotted or if certain orcas were spotted together. Manu-ally labeled pictures can be uploaded to the pipeline, and automatic labels from the pipeline can be downloaded.
2.2 The Framework
The Framework has four mayor parts that interact with each other: A browser interface, a local server, an external server, and the machine-learning pipeline. The browser interface has several webpages
2
which are hosted locally on the local server that runs on the computer of the user. When FIN-PRINT is opened, the local server starts, and the user can access the user interface in their web browser. The local server can access the images and labels on the computer of the client and sends them to the browser. This allows the user to view the images and to label them in their browser. If the user has labeled some pictures, the browser sends the newly created labels to the local server which stores them on the hard drive. It is also possible for the user to view and check labels from the pipeline and to correct the prediction of the pipeline. If an active internet connection is available, the local server can upload new images and labels to the external server or download automatically gen-erated labels from there anytime. As an interface between the local server and the pipeline, the exter-nal server can give the local server files from the pipeline and vice versa. If the pipeline gets an unla-beled image, it identifies all orcas on the images, and returns cropped images, each of them shows the fin an orca. Every cropped image also gets its own label file. These files can be sent to the user so that they can validate the predictions of the pipeline. If the pipeline gets an image with labels, it uses it to train itself. The external server saves all data related to images and their labels in a database, like the name of the uploader and the meta-data of the images (like GPS-coordinates and time stamps). Data from this database can be used to generate statistics about the orcas. With an active internet connec-tion, the user can use the web interface to request certain information that is stored in the database.
2.3 Programming tools
The web interface uses CSS, HTML and the scripting language JavaScript, which are supported by all normal web browsers like Safari, Mozilla Firefox or Google Chrome. To ensure that the local server can run on any platform, it is programmed with node.js, a platform independent server framework that also uses JavaScript. The external server also uses node.js and will be hosted at the Friedrich-Alexander university. As JavaScript and node.js are wide-spread programming tools, it offers other people to maintain the framework and to add new features to it if necessary.
3. Literature
[1] C. Bergler et al. “FIN-PRINT: A Fully-Automated Multi-Stage Deep-Learning-Based Framework for Killer Whale Individual Recognition”, Germany. FAU Erlangen-Nuernberg. Date: not published yet.
[2] Alexander Gebhard. “Orca Individual Identification based on Image Classification Using Deep Learn-ing”, Germany. FAU Erlangen-Nuernberg. Date: October 19, 2020.
[3] J. Towers et al. “Photo-identification catalogue, population status, and distribution of bigg’s killer whales known from coastal waters of british Columbia”, Canada. Fisheries and Oceans Canada, Pacific Biological Station, Nanaimo, BC. Date: 2019.

CoachLea: An Android Application to evaluate the progress of speaking and hearing abilities of children with Cochlear Implant

In 2018 the WHO estimated that there are 466 million persons (34 million children) with disabling hearing loss which equals about 6.1% of the world’s population. For individuals with severe loss who do not benefit from standard hearing aids, the cochlear implant (CI) represents a very important advance in treatment. However, CI users often present altered speech production and limited understanding even after hearing rehabilitation. Thus, if the deficits of speech would be known the rehabilitation might be addressed. This is particularly important for children who were born deaf or lost their hearing ability before speech acquisition. On the one hand, they have not been able to hear other people speak in the time before the implantation. On the other hand, they have never been able to monitor their own speech. Therefore, this project proposes an Android application to evaluate the speech production and hearing abilities of children with CI.

Enhanced Generative Learning Methods for Real-World Super-Resolution Problems in Smartphone Images

The goal of this bachelor thesis is to extend the work of Lugmayr et al. [1] in order to improve the generative network by using a learned image down sampler motivated from CAR network [2] instead of bicubic down sampling. The aim is to achieve a better image quality or a more robust SR network for images of Real-World data distribution.

[1] Lugmayr, Andreas, Martin Danelljan, and Radu Timofte. “Unsupervised learning for real-world super-resolution.” 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019.

[2] Sun, Wanjie, and Zhenzhong Chen. “Learned image downscaling for upscaling using content adaptive resampler.” IEEE Transactions on Image Processing 29 (2020): 4027-4040.