Index
CoachLea: An Android Application to evaluate the progress of speaking and hearing abilities of children with Cochlear Implant
In 2018 the WHO estimated that there are 466 million persons (34 million children) with disabling hearing loss which equals about 6.1% of the world’s population. For individuals with severe loss who do not benefit from standard hearing aids, the cochlear implant (CI) represents a very important advance in treatment. However, CI users often present altered speech production and limited understanding even after hearing rehabilitation. Thus, if the deficits of speech would be known the rehabilitation might be addressed. This is particularly important for children who were born deaf or lost their hearing ability before speech acquisition. On the one hand, they have not been able to hear other people speak in the time before the implantation. On the other hand, they have never been able to monitor their own speech. Therefore, this project proposes an Android application to evaluate the speech production and hearing abilities of children with CI.
Enhanced Generative Learning Methods for Real-World Super-Resolution Problems in Smartphone Images
The goal of this bachelor thesis is to extend the work of Lugmayr et al. [1] in order to improve the generative network by using a learned image down sampler motivated from CAR network [2] instead of bicubic down sampling. The aim is to achieve a better image quality or a more robust SR network for images of Real-World data distribution.
[1] Lugmayr, Andreas, Martin Danelljan, and Radu Timofte. “Unsupervised learning for real-world super-resolution.” 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019.
[2] Sun, Wanjie, and Zhenzhong Chen. “Learned image downscaling for upscaling using content adaptive resampler.” IEEE Transactions on Image Processing 29 (2020): 4027-4040.
Development of Automated Hardware Requirement Checks for Medical Android Applications
Network Deconvolution as Sparse Representations for Medical Image Analysis
DeepTechnome – Mitigating Bias Related to Image Formation in Deep Learning Based Assessment of CT Images
Multi-task Learning for Historical Document Classification with Transformers
Description
As of recent, transformer models[1] have started to outperform the classic deep convolutional neural networks in many classic computer vision tasks. These transformer models consist of multi-headed self-attention layers followed by linear layers. The former layer soft-routes value information based on three matrix embeddings: query, key and value. The inner product of query and key are input into a softmax function for normalization and the resulting similarity matrix is multiplied with the value embedding. Multi-headed self-attention creates multiple sets of query, key and value matrices that are independently computed, then concatenated and projected into the original embedding dimension. Visual transformers excel in their ability to incorporate non-local information into their latent representation, allowing for better results when classification relevant information is scattered across the entire image.
The downside of pure attention models like ViT [2], which treat image-patches as sequence-tokens, is the requirement of lots of samples to make up for their lack of inductive priors. This makes them unsuitable for low-data regimes like historical document analysis. Further, the computation of the similarity matrix leads to a matrix quadratic in input length, complicating high-resolution computations.
One solution promising to alleviate the data hunger of transformers while still profiting from their global representation ability, is the usage of hybrid methods that combine CNN and self-attention layers. Those models jointly train a network comprised of a number of convolutional layers to preprocess and downsample inputs, followed by a form of multi-headed self-attention. [3] differentiates hybrid self-attention models into “transformer blocks” and “non-local blocks”, the latter of which is equivalent to single-headed self-attention sans the lack of value embeddings and positional encodings.
The objective of this thesis is the classification of script type, date and location of historical documents, using a single multi-headed hybrid self-attention model.
The thesis consists of the following milestones:
- Construction of hybrid models for classification
- Benchmarking on the ICDAR 2021 competition dataset
- Further architectural analyses of hybrid self-attention models
References
[2] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Un-terthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and NeilHoulsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternationalConference on Learning Representations, 2021.
[3] Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. Bottleneck transformers for visual recognition, 2021.
Interpolation of deformation field for brain-shift compensation using Gaussian Process
Brain shift is the change of the position and shape of the brain during a neurosurgical procedure due to more space after opening the skull. This intraoperative soft tissue deformation limits the use of neuroanatomical overlays that were produced prior to the surgery. Consequently, intraoperative image updates are necessary to compensate for brain shift.
Comprehensive reviews concerning different aspects of intraoperative brain shift compensation can be found in [1][2]. Recently, feature based registration frameworks using SIFT features [3] or vessel centerlines [4] has been proposed to update the preoperative image in a deformable fashion, whereas point matching algorithm such as coherent point drift [5] or hybrid mixture model [4] are used to establish point correspondences between source and target feature point set. In order to estimate a dense deformation field according to the point correspondence, B-spline [6] and Thin-plate-spline [7] interpolation techniques are commonly used.
Gaussian process [8] (GP) is a powerful machine learning tool, which has been applied for image denoising, interpolation and segmentation. In this work, we are aiming at the application of different GP kernels for brain shift compensation. Furthermore, GP-based interpolation of deformation field is compared with the state-of-the-art methods.
In detail, this thesis includes the following aspects:
- Literature review of state-of-the-art method for brain shift compensation using feature-based algorithms
- Literature review of state-of-the-art method for the interpolation of deformation field/vector field
- Introduction of Gaussian Process (GP)
- Integrate GP-based interpolation technique into feature based brain shift compensation framework
- Estimate dense deformation field from a sparse deformation field using GP
- Implementation of at least three different GP kernels
- Compare the performance of GP and state-of-the-art image interpolation techniques on various dataset, including synthetic data, phantom data and clinical data, with respect to accuracy, usability and run time.
[1] Bayer, S., Maier, A., Ostermeier, M., & Fahrig, R. (2017). Intraoperative Imaging Modalities and Compensation for Brain Shift in Tumor Resection Surgery. International Journal of Biomedical Imaging, 2017 .
[2] I. J. Gerard, M. Kersten-Oertel, K. Petrecca, D. Sirhan, J. A. Hall, and D. L. Collins, “Brain shift in neuronavigation of brain tumors: a review,” Medical Image Analysis, vol. 35, pp. 403–420, 2017.
[3] Luo J. et al. (2018) A Feature-Driven Active Framework for Ultrasound-Based Brain Shift Compensation. In: Frangi A., Schnabel J., Davatzikos C., Alberola-López C., Fichtinger G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science, vol 11073. Springer, Cham
[4] Bayer S, Zhai Z, Strumia M, Tong. XG, Gao Y, Staring M, Stoe B, Fahrig R, Arya N, Meier. A, Ravikumar N. Registration of vascular structures using a hybrid mixture model in: International Journal of Computer Assisted Radiology and Surgery, Juni 2019
[5] Myronenko, A., Song, X.: Point set registration: Coherent point drift. IEEE Trans.Pattern. Anal. Mach. Intell.32 (12), 2262-2275 (2010)
[6] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach and D. J. Hawkes, “Nonrigid registration using free-form deformations: application to breast MR images,” in IEEE Transactions on Medical Imaging, vol. 18, no. 8, pp. 712-721, Aug. 1999.
[7] F. L. Bookstein, “Principal warps: thin-plate splines and the decomposition of deformations,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 6, pp. 567-585, June 1989.
[8] C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006
Automatic Bird Individual Recognition in Multi-Channel Recording Scenarios
Problem background:
At the Max-Planck-Institute for Ornithology in Radolfszell several birds are equipped with
backpacks to record their calls. But not only the sound of the equipped bird is recorded but also
of the birds in its surroundings and as a result the scientists receive several non-synchronous
audio tracks with bird calls. The biologists have to manually match the calls to the individual
birds, which is time-consuming and can easily lead to mistakes.
Goal of the thesis:
The goal of this thesis is to implement a python framework that can assign the calls to the
corresponding birds.
Since the intensity of a call decreases exponentially with distance, the loudest call can be
matched to the bird with this recorder. Also, the call of the mentioned bird appears earlier on
its own recording device than on the other devices.
To assign the further calls to the remaining birds, the soundtracks must be compared by
overlaying the audio signals. For this purpose, the audio signals have to be modified first:
Since different devices are used for capturing data and because the recordings cannot be started
at the same time, a linear time offset between the recordings occurs. Also, a linear distortion
appears as the devices record at different frequencies.
To remove these inconsistencies, similar characteristics must be found in the audio signals and
then the audio tracks have to be shifted and processed until these characteristics lie one above
another. There are several methods to filter out these characteristics, whereby the most precise
methods require human assistance [1]. But there are also some automated approaches, where
the audio track is scanned for periodic signal parameters such as pitch or spectral flatness.
Effective features are essential for the removal of distortion as well as a good ability of the
algorithm to distinguish between minor similarities of the characteristics [2].
The framework will be implemented in Python. It should process the given audio tracks and
recognize and reject disturbed channels.
References:
[1] Brett G. Crockett, Michael J. Smithers. Method for time aligning audio signals using
characterizations based on auditory events, 2002
[2] Jürgen Herre, Eric Allamanche, Oliver Hellmuth. Robust matching of audio signals using
spectral flatness features, 2002
Detection of Label Noise in Solar Cell Datasets
On-site inspection of solar panels is a time-consuming and difficult process, as the solar panels are often difficult to reach. Furthermore, identifying defects can be hard, especially for small cracks. Electroluminescence (EL) imaging enables the detection of small cracks, for example using a convolutional neural network (CNN) [1,2]. Hence, it can be used to identify such cracks before they propagate and result in a measurable impact on the efficiency of a solar panel [3]. This way costly inspection and replacement of solar panels can be avoided.
To train a CNN for the detection of cracks, a comprehensive dataset of labeled solar cells is required. Unfortunately, assessing, if a certain structure on a polycrystalline solar cell corresponds to a crack or not, is a hard task, even for human experts. As a result, setting up a consistently labeled dataset is nearly impossible. That is why EL datasets of solar cells favor a significant amount of label noise.
It has been shown that CNNs are robust against small amounts of label noise, but there may be drastic influence on the performance starting at 5%-10% of label noise [4]. This thesis will
(1) analyze the given dataset with respect to label noise and
(2) attempts to minimize the negative impact on the performance of the trained network caused by label noise.
Recently, Ding et. al. proposed to identify label noise by clustering of the features learned by the CNN [4]. As part of this thesis, the proposed method will be applied to a dataset consisting of more than 40k labeled samples of solar cells, which is known to contain a significant amount of label noise. As a result, it will be investigated, if the method can be used to identify noisy samples. Furthermore, it will be evaluated, if abstaining from noisy samples improves the performance of the resulting model. To this end, a subset of the dataset will be labeled by at least three experts to obtain a cleaned subset. Finally, an extension of the method will be developed. Here, it shall be evaluated, if the clustering can be omitted, since this proved instable in prior experiments using the same data.
[1] Deitsch, Sergiu, et al. “Automatic classification of defective photovoltaic module cells in electroluminescence images.” Solar Energy 185 (2019): 455-468.
[2] Mayr, Martin, et al. “Weakly Supervised Segmentation of Cracks on Solar Cells Using Normalized L p Norm.” 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
[3] Köntges, Marc, et al. “Impact of transportation on silicon wafer‐based photovoltaic modules.” Progress in Photovoltaics: research and applications 24.8 (2016): 1085-1095.
[4] Ding, Guiguang, et al. “DECODE: Deep confidence network for robust image classification.” IEEE Transactions on Image Processing 28.8 (2019): 3752-3765.
Semi-Supervised Segmentation of Cell Images using Differentiable Rendering.
With the recent advancements in machine learning and mainly deep learning [1], deep convolutional neural networks
(CNNs) [2–7] have been developed, which are able to learn from data sets containing millions of images [8] to resolve
object detection tasks. When trained on such big data sets, CNNs are able to achieve task-relevant object detection
performances that are comparable or even superior to the capabilities of humans [9, 10]. A key problem of using deep
learning for cell detection is in general the large amount of data needed to train such networks. The main difficulty
lies in the acquisition of a representative data set of cell images which ideally contain various sizes, shapes and distributions
for a variety of cell types. Additionally, the manual annotation of the acquired data is mandatory to obtain
the so called ‘ground truth’ or ‘labels’, which is in general error-prone, time-consuming and costly to obtain.
Differentiable rendering [11–13] on the other hand is a emerging technique, which allows to generate synthetic,
photo-realistic images based on photographs of real-world objects by estimating its 3D shape and material properties.
While this approach can be used to generate photo-realistic images, it can also be applied for the generation of
respective ground truth labels for segmentation and object detection masks. Combining differentiable rendering with
deep learning could potentially solve the data bottleneck for machine learning algorithms in various fields, including
materials science and biomedical engineering.
The work of this thesis is based on the differentiable rendering framework ‘Redner’ [11] using data from the Cell
Tracking Challenge [14, 15]. In a first step, a literature research will be conducted on the topic of differentiable rendering.
In a second step, an existing implementation for the light, shader and geometry estimation of nanoparticles
will be adapted for the semi-supervised segmentation of GFP-GOWT1 mouse stem cells. Afterwards, the results of
this approach will be evaluated in terms of segmentation accuracy.
The thesis will include the following points:
• Getting familiar with the concepts of Differentiable Rendering and Gradient-based learning methods
• Implementation of a proof-of-concept for the semi-supervised segmentation of cells based on the ‘Redner’
framework using existing data from the Cell Tracking Challenge
• Evaluation of the method in terms of segmentation accuracy
• Elaboration of potential improvements for the method
Academic advisors:
References
[1] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
[2] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” Proceedings of the IEEE International Conference
on Computer Vision, vol. 2017-Octob, pp. 2980–2988, 2017.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv
preprint arXiv:1409.1556, 2014.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 770–778, 2016.
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,”
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-
Decem, pp. 779–788, 2016.
[6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”
in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241,
Springer, 2015.
[7] T. Falk, D. Mai, R. Bensch, O¨ . C¸ ic¸ek, A. Abdulkadir, Y. Marrakchi, A. Bo¨hm, J. Deubner, Z. Ja¨ckel, K. Seiwald,
A. Dovzhenko, O. Tietz, C. Dal Bosco, S. Walsh, D. Saltukoglu, T. L. Tay, M. Prinz, K. Palme, M. Simons,
I. Diester, T. Brox, and O. Ronneberger, “U-Net: deep learning for cell counting, detection, and morphometry,”
Nature Methods, vol. 16, no. 1, pp. 67–70, 2019.
[8] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image
database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, no. May 2014, pp. 248–255,
2009.
[9] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou,
V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap,
M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks
and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”
in Advances in neural information processing systems, pp. 1097–1105, 2012.
[11] T.-M. Li, M. Aittala, F. Durand, and J. Lehtinen, “Differentiable monte carlo ray tracing through edge sampling,”
ACM Trans. Graph., vol. 37, Dec. 2018.
[12] M. Nimier-David, D. Vicini, T. Zeltner, and W. Jakob, “Mitsuba 2: a retargetable forward and inverse renderer,”
ACM Transactions on Graphics (TOG), vol. 38, no. 6, p. 203, 2019.
[13] G. Loubet, N. Holzschuch, andW. Jakob, “Reparameterizing discontinuous integrands for differentiable rendering,”
ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–14, 2019.
[14] M. Maˇska, V. Ulman, D. Svoboda, P. Matula, P. Matula, C. Ederra, A. Urbiola, T. Espa˜na, S. Venkatesan,
D. M. Balak, et al., “A benchmark for comparison of cell tracking algorithms,” Bioinformatics, vol. 30, no. 11,
pp. 1609–1617, 2014.
[15] V. Ulman, M. Maˇska, K. E. Magnusson, O. Ronneberger, C. Haubold, N. Harder, P. Matula, P. Matula, D. Svoboda,
M. Radojevic, et al., “An objective comparison of cell-tracking algorithms,” Nature methods, vol. 14,
no. 12, p. 1141, 2017.