Index
Web-Based Server-Client Software Framework for Killer Whale Indivirual Recognition
1. Motivation
In the last decades, more and more documentation and data storage for research purposes was done with computers, enabling the use of algorithms to work on the data. In the 1970s, it was discovered that individual killer whales, like orcas, can be identified by their natural markings, like their fin. Re-searchers have taken pictures of killer whales to identify and document them during the last years. Each discovered orca gets a unique ID, which will be referred as “label” in the further context. This identification process is currently done manually by researchers for each picture. For his master’s the-sis “Orca Individual Identification based on Image Classification Using Deep Learning”, Alexander Gebhard developed a machine-learning pipeline that can identify individual orcas on images. If the pipeline is given a picture without labels, it will return the most likely labels for this picture. If it is given a picture with a label, then the labels and the image will be used to train the pipeline so that it can identify individual orcas better. The goal of my bachelor’s thesis is to develop and document a user-friendly web-based server-client framework (here referred as FIN-PRINT) that allows researchers to synchronize images and labels with the pipeline and to create as well as to store labels and relevant meta-data.
2. Concept
2.1 Description
FIN-PRINT is a platform-independent framework that can run on different operating systems, like Win-dows, Mac OS and Linux. Users of this framework can interact with it over several webpages that are hosted locally on their computer. The interface allows the user to browse through images of orcas on their local hard-drive and to add labels to these images. They can also check automatically generated labels from the machine-learning pipeline and make corrections to the labels if necessary. Labeling new images or checking the labels from the pipeline can be done offline without an internet connec-tion if the necessary files are present. If the user has an active internet connection, they can also ex-plore statistics from a database which stores relevant data about the images and their labels, like GPS-data, the dates where an individual orca was spotted or if certain orcas were spotted together. Manu-ally labeled pictures can be uploaded to the pipeline, and automatic labels from the pipeline can be downloaded.
2.2 The Framework
The Framework has four mayor parts that interact with each other: A browser interface, a local server, an external server, and the machine-learning pipeline. The browser interface has several webpages
2
which are hosted locally on the local server that runs on the computer of the user. When FIN-PRINT is opened, the local server starts, and the user can access the user interface in their web browser. The local server can access the images and labels on the computer of the client and sends them to the browser. This allows the user to view the images and to label them in their browser. If the user has labeled some pictures, the browser sends the newly created labels to the local server which stores them on the hard drive. It is also possible for the user to view and check labels from the pipeline and to correct the prediction of the pipeline. If an active internet connection is available, the local server can upload new images and labels to the external server or download automatically gen-erated labels from there anytime. As an interface between the local server and the pipeline, the exter-nal server can give the local server files from the pipeline and vice versa. If the pipeline gets an unla-beled image, it identifies all orcas on the images, and returns cropped images, each of them shows the fin an orca. Every cropped image also gets its own label file. These files can be sent to the user so that they can validate the predictions of the pipeline. If the pipeline gets an image with labels, it uses it to train itself. The external server saves all data related to images and their labels in a database, like the name of the uploader and the meta-data of the images (like GPS-coordinates and time stamps). Data from this database can be used to generate statistics about the orcas. With an active internet connec-tion, the user can use the web interface to request certain information that is stored in the database.
2.3 Programming tools
The web interface uses CSS, HTML and the scripting language JavaScript, which are supported by all normal web browsers like Safari, Mozilla Firefox or Google Chrome. To ensure that the local server can run on any platform, it is programmed with node.js, a platform independent server framework that also uses JavaScript. The external server also uses node.js and will be hosted at the Friedrich-Alexander university. As JavaScript and node.js are wide-spread programming tools, it offers other people to maintain the framework and to add new features to it if necessary.
3. Literature
[1] C. Bergler et al. “FIN-PRINT: A Fully-Automated Multi-Stage Deep-Learning-Based Framework for Killer Whale Individual Recognition”, Germany. FAU Erlangen-Nuernberg. Date: not published yet.
[2] Alexander Gebhard. “Orca Individual Identification based on Image Classification Using Deep Learn-ing”, Germany. FAU Erlangen-Nuernberg. Date: October 19, 2020.
[3] J. Towers et al. “Photo-identification catalogue, population status, and distribution of bigg’s killer whales known from coastal waters of british Columbia”, Canada. Fisheries and Oceans Canada, Pacific Biological Station, Nanaimo, BC. Date: 2019.
CoachLea: An Android Application to evaluate the progress of speaking and hearing abilities of children with Cochlear Implant
In 2018 the WHO estimated that there are 466 million persons (34 million children) with disabling hearing loss which equals about 6.1% of the world’s population. For individuals with severe loss who do not benefit from standard hearing aids, the cochlear implant (CI) represents a very important advance in treatment. However, CI users often present altered speech production and limited understanding even after hearing rehabilitation. Thus, if the deficits of speech would be known the rehabilitation might be addressed. This is particularly important for children who were born deaf or lost their hearing ability before speech acquisition. On the one hand, they have not been able to hear other people speak in the time before the implantation. On the other hand, they have never been able to monitor their own speech. Therefore, this project proposes an Android application to evaluate the speech production and hearing abilities of children with CI.
Enhanced Generative Learning Methods for Real-World Super-Resolution Problems in Smartphone Images
The goal of this bachelor thesis is to extend the work of Lugmayr et al. [1] in order to improve the generative network by using a learned image down sampler motivated from CAR network [2] instead of bicubic down sampling. The aim is to achieve a better image quality or a more robust SR network for images of Real-World data distribution.
[1] Lugmayr, Andreas, Martin Danelljan, and Radu Timofte. “Unsupervised learning for real-world super-resolution.” 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019.
[2] Sun, Wanjie, and Zhenzhong Chen. “Learned image downscaling for upscaling using content adaptive resampler.” IEEE Transactions on Image Processing 29 (2020): 4027-4040.
Development of Automated Hardware Requirement Checks for Medical Android Applications
Network Deconvolution as Sparse Representations for Medical Image Analysis
DeepTechnome – Mitigating Bias Related to Image Formation in Deep Learning Based Assessment of CT Images
Multi-task Learning for Historical Document Classification with Transformers
Description
As of recent, transformer models[1] have started to outperform the classic deep convolutional neural networks in many classic computer vision tasks. These transformer models consist of multi-headed self-attention layers followed by linear layers. The former layer soft-routes value information based on three matrix embeddings: query, key and value. The inner product of query and key are input into a softmax function for normalization and the resulting similarity matrix is multiplied with the value embedding. Multi-headed self-attention creates multiple sets of query, key and value matrices that are independently computed, then concatenated and projected into the original embedding dimension. Visual transformers excel in their ability to incorporate non-local information into their latent representation, allowing for better results when classification relevant information is scattered across the entire image.
The downside of pure attention models like ViT [2], which treat image-patches as sequence-tokens, is the requirement of lots of samples to make up for their lack of inductive priors. This makes them unsuitable for low-data regimes like historical document analysis. Further, the computation of the similarity matrix leads to a matrix quadratic in input length, complicating high-resolution computations.
One solution promising to alleviate the data hunger of transformers while still profiting from their global representation ability, is the usage of hybrid methods that combine CNN and self-attention layers. Those models jointly train a network comprised of a number of convolutional layers to preprocess and downsample inputs, followed by a form of multi-headed self-attention. [3] differentiates hybrid self-attention models into “transformer blocks” and “non-local blocks”, the latter of which is equivalent to single-headed self-attention sans the lack of value embeddings and positional encodings.
The objective of this thesis is the classification of script type, date and location of historical documents, using a single multi-headed hybrid self-attention model.
The thesis consists of the following milestones:
- Construction of hybrid models for classification
- Benchmarking on the ICDAR 2021 competition dataset
- Further architectural analyses of hybrid self-attention models
References
[2] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Un-terthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and NeilHoulsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternationalConference on Learning Representations, 2021.
[3] Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. Bottleneck transformers for visual recognition, 2021.
Interpolation of deformation field for brain-shift compensation using Gaussian Process
Brain shift is the change of the position and shape of the brain during a neurosurgical procedure due to more space after opening the skull. This intraoperative soft tissue deformation limits the use of neuroanatomical overlays that were produced prior to the surgery. Consequently, intraoperative image updates are necessary to compensate for brain shift.
Comprehensive reviews concerning different aspects of intraoperative brain shift compensation can be found in [1][2]. Recently, feature based registration frameworks using SIFT features [3] or vessel centerlines [4] has been proposed to update the preoperative image in a deformable fashion, whereas point matching algorithm such as coherent point drift [5] or hybrid mixture model [4] are used to establish point correspondences between source and target feature point set. In order to estimate a dense deformation field according to the point correspondence, B-spline [6] and Thin-plate-spline [7] interpolation techniques are commonly used.
Gaussian process [8] (GP) is a powerful machine learning tool, which has been applied for image denoising, interpolation and segmentation. In this work, we are aiming at the application of different GP kernels for brain shift compensation. Furthermore, GP-based interpolation of deformation field is compared with the state-of-the-art methods.
In detail, this thesis includes the following aspects:
- Literature review of state-of-the-art method for brain shift compensation using feature-based algorithms
- Literature review of state-of-the-art method for the interpolation of deformation field/vector field
- Introduction of Gaussian Process (GP)
- Integrate GP-based interpolation technique into feature based brain shift compensation framework
- Estimate dense deformation field from a sparse deformation field using GP
- Implementation of at least three different GP kernels
- Compare the performance of GP and state-of-the-art image interpolation techniques on various dataset, including synthetic data, phantom data and clinical data, with respect to accuracy, usability and run time.
[1] Bayer, S., Maier, A., Ostermeier, M., & Fahrig, R. (2017). Intraoperative Imaging Modalities and Compensation for Brain Shift in Tumor Resection Surgery. International Journal of Biomedical Imaging, 2017 .
[2] I. J. Gerard, M. Kersten-Oertel, K. Petrecca, D. Sirhan, J. A. Hall, and D. L. Collins, “Brain shift in neuronavigation of brain tumors: a review,” Medical Image Analysis, vol. 35, pp. 403–420, 2017.
[3] Luo J. et al. (2018) A Feature-Driven Active Framework for Ultrasound-Based Brain Shift Compensation. In: Frangi A., Schnabel J., Davatzikos C., Alberola-López C., Fichtinger G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science, vol 11073. Springer, Cham
[4] Bayer S, Zhai Z, Strumia M, Tong. XG, Gao Y, Staring M, Stoe B, Fahrig R, Arya N, Meier. A, Ravikumar N. Registration of vascular structures using a hybrid mixture model in: International Journal of Computer Assisted Radiology and Surgery, Juni 2019
[5] Myronenko, A., Song, X.: Point set registration: Coherent point drift. IEEE Trans.Pattern. Anal. Mach. Intell.32 (12), 2262-2275 (2010)
[6] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach and D. J. Hawkes, “Nonrigid registration using free-form deformations: application to breast MR images,” in IEEE Transactions on Medical Imaging, vol. 18, no. 8, pp. 712-721, Aug. 1999.
[7] F. L. Bookstein, “Principal warps: thin-plate splines and the decomposition of deformations,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 6, pp. 567-585, June 1989.
[8] C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006
Automatic Bird Individual Recognition in Multi-Channel Recording Scenarios
Problem background:
At the Max-Planck-Institute for Ornithology in Radolfszell several birds are equipped with
backpacks to record their calls. But not only the sound of the equipped bird is recorded but also
of the birds in its surroundings and as a result the scientists receive several non-synchronous
audio tracks with bird calls. The biologists have to manually match the calls to the individual
birds, which is time-consuming and can easily lead to mistakes.
Goal of the thesis:
The goal of this thesis is to implement a python framework that can assign the calls to the
corresponding birds.
Since the intensity of a call decreases exponentially with distance, the loudest call can be
matched to the bird with this recorder. Also, the call of the mentioned bird appears earlier on
its own recording device than on the other devices.
To assign the further calls to the remaining birds, the soundtracks must be compared by
overlaying the audio signals. For this purpose, the audio signals have to be modified first:
Since different devices are used for capturing data and because the recordings cannot be started
at the same time, a linear time offset between the recordings occurs. Also, a linear distortion
appears as the devices record at different frequencies.
To remove these inconsistencies, similar characteristics must be found in the audio signals and
then the audio tracks have to be shifted and processed until these characteristics lie one above
another. There are several methods to filter out these characteristics, whereby the most precise
methods require human assistance [1]. But there are also some automated approaches, where
the audio track is scanned for periodic signal parameters such as pitch or spectral flatness.
Effective features are essential for the removal of distortion as well as a good ability of the
algorithm to distinguish between minor similarities of the characteristics [2].
The framework will be implemented in Python. It should process the given audio tracks and
recognize and reject disturbed channels.
References:
[1] Brett G. Crockett, Michael J. Smithers. Method for time aligning audio signals using
characterizations based on auditory events, 2002
[2] Jürgen Herre, Eric Allamanche, Oliver Hellmuth. Robust matching of audio signals using
spectral flatness features, 2002
Detection of Label Noise in Solar Cell Datasets
On-site inspection of solar panels is a time-consuming and difficult process, as the solar panels are often difficult to reach. Furthermore, identifying defects can be hard, especially for small cracks. Electroluminescence (EL) imaging enables the detection of small cracks, for example using a convolutional neural network (CNN) [1,2]. Hence, it can be used to identify such cracks before they propagate and result in a measurable impact on the efficiency of a solar panel [3]. This way costly inspection and replacement of solar panels can be avoided.
To train a CNN for the detection of cracks, a comprehensive dataset of labeled solar cells is required. Unfortunately, assessing, if a certain structure on a polycrystalline solar cell corresponds to a crack or not, is a hard task, even for human experts. As a result, setting up a consistently labeled dataset is nearly impossible. That is why EL datasets of solar cells favor a significant amount of label noise.
It has been shown that CNNs are robust against small amounts of label noise, but there may be drastic influence on the performance starting at 5%-10% of label noise [4]. This thesis will
(1) analyze the given dataset with respect to label noise and
(2) attempts to minimize the negative impact on the performance of the trained network caused by label noise.
Recently, Ding et. al. proposed to identify label noise by clustering of the features learned by the CNN [4]. As part of this thesis, the proposed method will be applied to a dataset consisting of more than 40k labeled samples of solar cells, which is known to contain a significant amount of label noise. As a result, it will be investigated, if the method can be used to identify noisy samples. Furthermore, it will be evaluated, if abstaining from noisy samples improves the performance of the resulting model. To this end, a subset of the dataset will be labeled by at least three experts to obtain a cleaned subset. Finally, an extension of the method will be developed. Here, it shall be evaluated, if the clustering can be omitted, since this proved instable in prior experiments using the same data.
[1] Deitsch, Sergiu, et al. “Automatic classification of defective photovoltaic module cells in electroluminescence images.” Solar Energy 185 (2019): 455-468.
[2] Mayr, Martin, et al. “Weakly Supervised Segmentation of Cracks on Solar Cells Using Normalized L p Norm.” 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
[3] Köntges, Marc, et al. “Impact of transportation on silicon wafer‐based photovoltaic modules.” Progress in Photovoltaics: research and applications 24.8 (2016): 1085-1095.
[4] Ding, Guiguang, et al. “DECODE: Deep confidence network for robust image classification.” IEEE Transactions on Image Processing 28.8 (2019): 3752-3765.