Index
Improvement of Patient Specific SPECT and PET Brain Perfusion Phantoms for Assessment of Partial Volume Effect
Deep Orca Image Denoising Using Machine-Generated Binary Killer Whale Masks
Introduction
The following proposes the use of binary masks as a ground truth for the
denoising of deep learning based killer whale classication. It is part of a
project of the Pattern Recognition Lab of the FAU in cooperation with the
Vancouver Maritime Institute. Based on thousands of images of killer whale
populations taken over the last years, a deep learning approach was used
to ease the classication of individual animals for local researchers, both
visual and with call recognition [2]. Previous work focused on the extraction
of regions of interest from the original images and classication of single
animals. To limit the in
uence of noise on the classication, this thesis aims
to create binary masks of the animals from image segmentation. Binary
masks often present an accurate ground truth for deep learning approaches.
The following work is therefore closely related to [4]. It is part of the visual
counterpart of the existing “Denoising Kit” for audio signals of killer whales.
Motivation
Noise plays a crucial role in the detection and reconstruction of images.
In this case close color spaces and partially blurry images throughout the
extracted data limit the success of deep learning based classication. With
a binary mask of the orca body as a ground truth, a network can be trained
1
without the in
uence of noise. This can further increase the accuracy of orca
detection, helping researchers to track animal populations much easier.
Approach
Two approaches have presented themselves to be most ecient and are going
to be utilized. First, common methods are used to detect edges in the orca
images. This will be done with the popular Canny Edge Algorithm. The
images are also processed by a superpixel segmentation algorithm [5]. By
overlaying both results, an accurate outline of the animals shape can be
segmented. After a binarization, the resulting mask will be used as a ground
truth for a deep learning network. With it the original images are denoised
to account for a better classication later.
Finally this thesis will look into intelligent data augmentation in the form
of image morphing techniques, utilizing the created binary masks. With
feature-based image morphing [1], the variety of training data and therefore
also the accuracy of the underlying classier could be further improved.
Medical application
Ground truth binary masks can and in some parts already have application
in computer vision tasks in the medical eld. Deep learning classication of
tumors in CT and MRI images are often based on binary masks, traced by
radiologists [3]. Similar issues regarding noise are often faced.
References
[1] Thaddeus Beier and Shawn Neely. Feature-based image metamorphosis.
ACM SIGGRAPH computer graphics, 26(2):35{42, 1992.
[2] Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas K Maier,
Volker Barth, and Elmar Noth. Deep learning for orca call type
identication-a fully unsupervised approach. In INTERSPEECH, pages
3357{3361, 2019.
[3] Francisco Javier Daz-Pernas, Mario Martnez-Zarzuela, Mriam Anton-
Rodrguez, and David Gonzalez-Ortega. A deep learning approach for
2
brain tumor classication and segmentation using a multiscale convolutional
neural network. In Healthcare, volume 9, page 153. Multidisciplinary
Digital Publishing Institute, 2021.
[4] Christian Bergler et. al. Orca-clean: A deep denoising toolkit for killer
whale communication. INTERSPEECH, 2020.
[5] Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based
image segmentation. International journal of computer vision, 59(2):167{
181, 2004.
Web-Based Server-Client Software Framework for Killer Whale Indivirual Recognition
1. Motivation
In the last decades, more and more documentation and data storage for research purposes was done with computers, enabling the use of algorithms to work on the data. In the 1970s, it was discovered that individual killer whales, like orcas, can be identified by their natural markings, like their fin. Re-searchers have taken pictures of killer whales to identify and document them during the last years. Each discovered orca gets a unique ID, which will be referred as “label” in the further context. This identification process is currently done manually by researchers for each picture. For his master’s the-sis “Orca Individual Identification based on Image Classification Using Deep Learning”, Alexander Gebhard developed a machine-learning pipeline that can identify individual orcas on images. If the pipeline is given a picture without labels, it will return the most likely labels for this picture. If it is given a picture with a label, then the labels and the image will be used to train the pipeline so that it can identify individual orcas better. The goal of my bachelor’s thesis is to develop and document a user-friendly web-based server-client framework (here referred as FIN-PRINT) that allows researchers to synchronize images and labels with the pipeline and to create as well as to store labels and relevant meta-data.
2. Concept
2.1 Description
FIN-PRINT is a platform-independent framework that can run on different operating systems, like Win-dows, Mac OS and Linux. Users of this framework can interact with it over several webpages that are hosted locally on their computer. The interface allows the user to browse through images of orcas on their local hard-drive and to add labels to these images. They can also check automatically generated labels from the machine-learning pipeline and make corrections to the labels if necessary. Labeling new images or checking the labels from the pipeline can be done offline without an internet connec-tion if the necessary files are present. If the user has an active internet connection, they can also ex-plore statistics from a database which stores relevant data about the images and their labels, like GPS-data, the dates where an individual orca was spotted or if certain orcas were spotted together. Manu-ally labeled pictures can be uploaded to the pipeline, and automatic labels from the pipeline can be downloaded.
2.2 The Framework
The Framework has four mayor parts that interact with each other: A browser interface, a local server, an external server, and the machine-learning pipeline. The browser interface has several webpages
2
which are hosted locally on the local server that runs on the computer of the user. When FIN-PRINT is opened, the local server starts, and the user can access the user interface in their web browser. The local server can access the images and labels on the computer of the client and sends them to the browser. This allows the user to view the images and to label them in their browser. If the user has labeled some pictures, the browser sends the newly created labels to the local server which stores them on the hard drive. It is also possible for the user to view and check labels from the pipeline and to correct the prediction of the pipeline. If an active internet connection is available, the local server can upload new images and labels to the external server or download automatically gen-erated labels from there anytime. As an interface between the local server and the pipeline, the exter-nal server can give the local server files from the pipeline and vice versa. If the pipeline gets an unla-beled image, it identifies all orcas on the images, and returns cropped images, each of them shows the fin an orca. Every cropped image also gets its own label file. These files can be sent to the user so that they can validate the predictions of the pipeline. If the pipeline gets an image with labels, it uses it to train itself. The external server saves all data related to images and their labels in a database, like the name of the uploader and the meta-data of the images (like GPS-coordinates and time stamps). Data from this database can be used to generate statistics about the orcas. With an active internet connec-tion, the user can use the web interface to request certain information that is stored in the database.
2.3 Programming tools
The web interface uses CSS, HTML and the scripting language JavaScript, which are supported by all normal web browsers like Safari, Mozilla Firefox or Google Chrome. To ensure that the local server can run on any platform, it is programmed with node.js, a platform independent server framework that also uses JavaScript. The external server also uses node.js and will be hosted at the Friedrich-Alexander university. As JavaScript and node.js are wide-spread programming tools, it offers other people to maintain the framework and to add new features to it if necessary.
3. Literature
[1] C. Bergler et al. “FIN-PRINT: A Fully-Automated Multi-Stage Deep-Learning-Based Framework for Killer Whale Individual Recognition”, Germany. FAU Erlangen-Nuernberg. Date: not published yet.
[2] Alexander Gebhard. “Orca Individual Identification based on Image Classification Using Deep Learn-ing”, Germany. FAU Erlangen-Nuernberg. Date: October 19, 2020.
[3] J. Towers et al. “Photo-identification catalogue, population status, and distribution of bigg’s killer whales known from coastal waters of british Columbia”, Canada. Fisheries and Oceans Canada, Pacific Biological Station, Nanaimo, BC. Date: 2019.
CoachLea: An Android Application to evaluate the progress of speaking and hearing abilities of children with Cochlear Implant
In 2018 the WHO estimated that there are 466 million persons (34 million children) with disabling hearing loss which equals about 6.1% of the world’s population. For individuals with severe loss who do not benefit from standard hearing aids, the cochlear implant (CI) represents a very important advance in treatment. However, CI users often present altered speech production and limited understanding even after hearing rehabilitation. Thus, if the deficits of speech would be known the rehabilitation might be addressed. This is particularly important for children who were born deaf or lost their hearing ability before speech acquisition. On the one hand, they have not been able to hear other people speak in the time before the implantation. On the other hand, they have never been able to monitor their own speech. Therefore, this project proposes an Android application to evaluate the speech production and hearing abilities of children with CI.
Enhanced Generative Learning Methods for Real-World Super-Resolution Problems in Smartphone Images
The goal of this bachelor thesis is to extend the work of Lugmayr et al. [1] in order to improve the generative network by using a learned image down sampler motivated from CAR network [2] instead of bicubic down sampling. The aim is to achieve a better image quality or a more robust SR network for images of Real-World data distribution.
[1] Lugmayr, Andreas, Martin Danelljan, and Radu Timofte. “Unsupervised learning for real-world super-resolution.” 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019.
[2] Sun, Wanjie, and Zhenzhong Chen. “Learned image downscaling for upscaling using content adaptive resampler.” IEEE Transactions on Image Processing 29 (2020): 4027-4040.
Development of Automated Hardware Requirement Checks for Medical Android Applications
Network Deconvolution as Sparse Representations for Medical Image Analysis
DeepTechnome – Mitigating Bias Related to Image Formation in Deep Learning Based Assessment of CT Images
Multi-task Learning for Historical Document Classification with Transformers
Description
As of recent, transformer models[1] have started to outperform the classic deep convolutional neural networks in many classic computer vision tasks. These transformer models consist of multi-headed self-attention layers followed by linear layers. The former layer soft-routes value information based on three matrix embeddings: query, key and value. The inner product of query and key are input into a softmax function for normalization and the resulting similarity matrix is multiplied with the value embedding. Multi-headed self-attention creates multiple sets of query, key and value matrices that are independently computed, then concatenated and projected into the original embedding dimension. Visual transformers excel in their ability to incorporate non-local information into their latent representation, allowing for better results when classification relevant information is scattered across the entire image.
The downside of pure attention models like ViT [2], which treat image-patches as sequence-tokens, is the requirement of lots of samples to make up for their lack of inductive priors. This makes them unsuitable for low-data regimes like historical document analysis. Further, the computation of the similarity matrix leads to a matrix quadratic in input length, complicating high-resolution computations.
One solution promising to alleviate the data hunger of transformers while still profiting from their global representation ability, is the usage of hybrid methods that combine CNN and self-attention layers. Those models jointly train a network comprised of a number of convolutional layers to preprocess and downsample inputs, followed by a form of multi-headed self-attention. [3] differentiates hybrid self-attention models into “transformer blocks” and “non-local blocks”, the latter of which is equivalent to single-headed self-attention sans the lack of value embeddings and positional encodings.
The objective of this thesis is the classification of script type, date and location of historical documents, using a single multi-headed hybrid self-attention model.
The thesis consists of the following milestones:
- Construction of hybrid models for classification
- Benchmarking on the ICDAR 2021 competition dataset
- Further architectural analyses of hybrid self-attention models
References
[2] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Un-terthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and NeilHoulsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternationalConference on Learning Representations, 2021.
[3] Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. Bottleneck transformers for visual recognition, 2021.
Interpolation of deformation field for brain-shift compensation using Gaussian Process
Brain shift is the change of the position and shape of the brain during a neurosurgical procedure due to more space after opening the skull. This intraoperative soft tissue deformation limits the use of neuroanatomical overlays that were produced prior to the surgery. Consequently, intraoperative image updates are necessary to compensate for brain shift.
Comprehensive reviews concerning different aspects of intraoperative brain shift compensation can be found in [1][2]. Recently, feature based registration frameworks using SIFT features [3] or vessel centerlines [4] has been proposed to update the preoperative image in a deformable fashion, whereas point matching algorithm such as coherent point drift [5] or hybrid mixture model [4] are used to establish point correspondences between source and target feature point set. In order to estimate a dense deformation field according to the point correspondence, B-spline [6] and Thin-plate-spline [7] interpolation techniques are commonly used.
Gaussian process [8] (GP) is a powerful machine learning tool, which has been applied for image denoising, interpolation and segmentation. In this work, we are aiming at the application of different GP kernels for brain shift compensation. Furthermore, GP-based interpolation of deformation field is compared with the state-of-the-art methods.
In detail, this thesis includes the following aspects:
- Literature review of state-of-the-art method for brain shift compensation using feature-based algorithms
- Literature review of state-of-the-art method for the interpolation of deformation field/vector field
- Introduction of Gaussian Process (GP)
- Integrate GP-based interpolation technique into feature based brain shift compensation framework
- Estimate dense deformation field from a sparse deformation field using GP
- Implementation of at least three different GP kernels
- Compare the performance of GP and state-of-the-art image interpolation techniques on various dataset, including synthetic data, phantom data and clinical data, with respect to accuracy, usability and run time.
[1] Bayer, S., Maier, A., Ostermeier, M., & Fahrig, R. (2017). Intraoperative Imaging Modalities and Compensation for Brain Shift in Tumor Resection Surgery. International Journal of Biomedical Imaging, 2017 .
[2] I. J. Gerard, M. Kersten-Oertel, K. Petrecca, D. Sirhan, J. A. Hall, and D. L. Collins, “Brain shift in neuronavigation of brain tumors: a review,” Medical Image Analysis, vol. 35, pp. 403–420, 2017.
[3] Luo J. et al. (2018) A Feature-Driven Active Framework for Ultrasound-Based Brain Shift Compensation. In: Frangi A., Schnabel J., Davatzikos C., Alberola-López C., Fichtinger G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science, vol 11073. Springer, Cham
[4] Bayer S, Zhai Z, Strumia M, Tong. XG, Gao Y, Staring M, Stoe B, Fahrig R, Arya N, Meier. A, Ravikumar N. Registration of vascular structures using a hybrid mixture model in: International Journal of Computer Assisted Radiology and Surgery, Juni 2019
[5] Myronenko, A., Song, X.: Point set registration: Coherent point drift. IEEE Trans.Pattern. Anal. Mach. Intell.32 (12), 2262-2275 (2010)
[6] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach and D. J. Hawkes, “Nonrigid registration using free-form deformations: application to breast MR images,” in IEEE Transactions on Medical Imaging, vol. 18, no. 8, pp. 712-721, Aug. 1999.
[7] F. L. Bookstein, “Principal warps: thin-plate splines and the decomposition of deformations,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 6, pp. 567-585, June 1989.
[8] C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006