Index

Development of Automated Hardware Requirement Checks for Medical Android Applications

Network Deconvolution as Sparse Representations for Medical Image Analysis

DeepTechnome – Mitigating Bias Related to Image Formation in Deep Learning Based Assessment of CT Images

Multi-task Learning for Historical Document Classification with Transformers

Description

As of recent, transformer models[1] have started to outperform the classic deep convolutional neural networks in many classic computer vision tasks. These transformer models consist of multi-headed self-attention layers followed by linear layers. The former layer soft-routes value information based on three matrix embeddings: query, key and value. The inner product of query and key are input into a softmax function for normalization and the resulting similarity matrix is multiplied with the value embedding. Multi-headed self-attention creates multiple sets of query, key and value matrices that are independently computed, then concatenated and projected into the original embedding dimension. Visual transformers excel in their ability to incorporate non-local information into their latent representation, allowing for better results when classification relevant information is scattered across the entire image.

The downside of pure attention models like ViT [2], which treat image-patches as sequence-tokens, is the requirement of lots of samples to make up for their lack of inductive priors. This makes them unsuitable for low-data regimes like historical document analysis. Further, the computation of the similarity matrix leads to a matrix quadratic in input length, complicating high-resolution computations.

One solution promising to alleviate the data hunger of transformers while still profiting from their global representation ability, is the usage of hybrid methods that combine CNN and self-attention layers. Those models jointly train a network comprised of a number of convolutional layers to preprocess and downsample inputs, followed by a form of multi-headed self-attention. [3] differentiates hybrid self-attention models into “transformer blocks” and “non-local blocks”, the latter of which is equivalent to single-headed self-attention sans the lack of value embeddings and positional encodings.

The objective of this thesis is the classification of script type, date and location of historical documents, using a single multi-headed hybrid self-attention model.

The thesis consists of the following milestones:

Construction of hybrid models for classification
Benchmarking on the ICDAR 2021 competition dataset
Further architectural analyses of hybrid self-attention models

References

[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,ŁukaszKaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[2] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Un-terthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and NeilHoulsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternationalConference on Learning Representations, 2021.
[3] Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. Bottleneck transformers for visual recognition, 2021.

Interpolation of deformation field for brain-shift compensation using Gaussian Process

Brain shift is the change of the position and shape of the brain during a neurosurgical procedure due to more space after opening the skull. This intraoperative soft tissue deformation limits the use of neuroanatomical overlays that were produced prior to the surgery. Consequently, intraoperative image updates are necessary to compensate for brain shift.

Comprehensive reviews concerning different aspects of intraoperative brain shift compensation can be found in [1][2]. Recently, feature based registration frameworks using SIFT features [3] or vessel centerlines [4] has been proposed to update the preoperative image in a deformable fashion, whereas point matching algorithm such as coherent point drift [5] or hybrid mixture model [4] are used to establish point correspondences between source and target feature point set. In order to estimate a dense deformation field according to the point correspondence, B-spline [6] and Thin-plate-spline [7] interpolation techniques are commonly used.

Gaussian process [8] (GP) is a powerful machine learning tool, which has been applied for image denoising, interpolation and segmentation. In this work, we are aiming at the application of different GP kernels for brain shift compensation. Furthermore, GP-based interpolation of deformation field is compared with the state-of-the-art methods.

In detail, this thesis includes the following aspects:

Literature review of state-of-the-art method for brain shift compensation using feature-based algorithms
Literature review of state-of-the-art method for the interpolation of deformation field/vector field
Introduction of Gaussian Process (GP)
Integrate GP-based interpolation technique into feature based brain shift compensation framework
- Estimate dense deformation field from a sparse deformation field using GP
- Implementation of at least three different GP kernels
- Compare the performance of GP and state-of-the-art image interpolation techniques on various dataset, including synthetic data, phantom data and clinical data, with respect to accuracy, usability and run time.

[1] Bayer, S., Maier, A., Ostermeier, M., & Fahrig, R. (2017). Intraoperative Imaging Modalities and Compensation for Brain Shift in Tumor Resection Surgery. International Journal of Biomedical Imaging, 2017 .

[2] I. J. Gerard, M. Kersten-Oertel, K. Petrecca, D. Sirhan, J. A. Hall, and D. L. Collins, “Brain shift in neuronavigation of brain tumors: a review,” Medical Image Analysis, vol. 35, pp. 403–420, 2017.

[3] Luo J. et al. (2018) A Feature-Driven Active Framework for Ultrasound-Based Brain Shift Compensation. In: Frangi A., Schnabel J., Davatzikos C., Alberola-López C., Fichtinger G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science, vol 11073. Springer, Cham

[4] Bayer S, Zhai Z, Strumia M, Tong. XG, Gao Y, Staring M, Stoe B, Fahrig R, Arya N, Meier. A, Ravikumar N. Registration of vascular structures using a hybrid mixture model in: International Journal of Computer Assisted Radiology and Surgery, Juni 2019

[5] Myronenko, A., Song, X.: Point set registration: Coherent point drift. IEEE Trans.Pattern. Anal. Mach. Intell.32 (12), 2262-2275 (2010)

[6] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach and D. J. Hawkes, “Nonrigid registration using free-form deformations: application to breast MR images,” in IEEE Transactions on Medical Imaging, vol. 18, no. 8, pp. 712-721, Aug. 1999.

[7] F. L. Bookstein, “Principal warps: thin-plate splines and the decomposition of deformations,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 6, pp. 567-585, June 1989.

[8] C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006

Automatic Bird Individual Recognition in Multi-Channel Recording Scenarios

Problem background:
At the Max-Planck-Institute for Ornithology in Radolfszell several birds are equipped with
backpacks to record their calls. But not only the sound of the equipped bird is recorded but also
of the birds in its surroundings and as a result the scientists receive several non-synchronous
audio tracks with bird calls. The biologists have to manually match the calls to the individual
birds, which is time-consuming and can easily lead to mistakes.
Goal of the thesis:
The goal of this thesis is to implement a python framework that can assign the calls to the
corresponding birds.
Since the intensity of a call decreases exponentially with distance, the loudest call can be
matched to the bird with this recorder. Also, the call of the mentioned bird appears earlier on
its own recording device than on the other devices.
To assign the further calls to the remaining birds, the soundtracks must be compared by
overlaying the audio signals. For this purpose, the audio signals have to be modified first:
Since different devices are used for capturing data and because the recordings cannot be started
at the same time, a linear time offset between the recordings occurs. Also, a linear distortion
appears as the devices record at different frequencies.
To remove these inconsistencies, similar characteristics must be found in the audio signals and
then the audio tracks have to be shifted and processed until these characteristics lie one above
another. There are several methods to filter out these characteristics, whereby the most precise
methods require human assistance [1]. But there are also some automated approaches, where
the audio track is scanned for periodic signal parameters such as pitch or spectral flatness.
Effective features are essential for the removal of distortion as well as a good ability of the
algorithm to distinguish between minor similarities of the characteristics [2].
The framework will be implemented in Python. It should process the given audio tracks and
recognize and reject disturbed channels.
References:
[1] Brett G. Crockett, Michael J. Smithers. Method for time aligning audio signals using
characterizations based on auditory events, 2002
[2] Jürgen Herre, Eric Allamanche, Oliver Hellmuth. Robust matching of audio signals using
spectral flatness features, 2002

Detection of Label Noise in Solar Cell Datasets

On-site inspection of solar panels is a time-consuming and difficult process, as the solar panels are often difficult to reach. Furthermore, identifying defects can be hard, especially for small cracks. Electroluminescence (EL) imaging enables the detection of small cracks, for example using a convolutional neural network (CNN) [1,2]. Hence, it can be used to identify such cracks before they propagate and result in a measurable impact on the efficiency of a solar panel [3]. This way costly inspection and replacement of solar panels can be avoided.

To train a CNN for the detection of cracks, a comprehensive dataset of labeled solar cells is required. Unfortunately, assessing, if a certain structure on a polycrystalline solar cell corresponds to a crack or not, is a hard task, even for human experts. As a result, setting up a consistently labeled dataset is nearly impossible. That is why EL datasets of solar cells favor a significant amount of label noise.

It has been shown that CNNs are robust against small amounts of label noise, but there may be drastic influence on the performance starting at 5%-10% of label noise [4]. This thesis will

(1) analyze the given dataset with respect to label noise and
(2) attempts to minimize the negative impact on the performance of the trained network caused by label noise.

Recently, Ding et. al. proposed to identify label noise by clustering of the features learned by the CNN [4]. As part of this thesis, the proposed method will be applied to a dataset consisting of more than 40k labeled samples of solar cells, which is known to contain a significant amount of label noise. As a result, it will be investigated, if the method can be used to identify noisy samples. Furthermore, it will be evaluated, if abstaining from noisy samples improves the performance of the resulting model. To this end, a subset of the dataset will be labeled by at least three experts to obtain a cleaned subset. Finally, an extension of the method will be developed. Here, it shall be evaluated, if the clustering can be omitted, since this proved instable in prior experiments using the same data.

[1] Deitsch, Sergiu, et al. “Automatic classification of defective photovoltaic module cells in electroluminescence images.” Solar Energy 185 (2019): 455-468.
[2] Mayr, Martin, et al. “Weakly Supervised Segmentation of Cracks on Solar Cells Using Normalized L p Norm.” 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
[3] Köntges, Marc, et al. “Impact of transportation on silicon wafer‐based photovoltaic modules.” Progress in Photovoltaics: research and applications 24.8 (2016): 1085-1095.
[4] Ding, Guiguang, et al. “DECODE: Deep confidence network for robust image classification.” IEEE Transactions on Image Processing 28.8 (2019): 3752-3765.

Semi-Supervised Segmentation of Cell Images using Differentiable Rendering.

With the recent advancements in machine learning and mainly deep learning [1], deep convolutional neural networks
(CNNs) [2–7] have been developed, which are able to learn from data sets containing millions of images [8] to resolve
object detection tasks. When trained on such big data sets, CNNs are able to achieve task-relevant object detection
performances that are comparable or even superior to the capabilities of humans [9, 10]. A key problem of using deep
learning for cell detection is in general the large amount of data needed to train such networks. The main difficulty
lies in the acquisition of a representative data set of cell images which ideally contain various sizes, shapes and distributions
for a variety of cell types. Additionally, the manual annotation of the acquired data is mandatory to obtain
the so called ‘ground truth’ or ‘labels’, which is in general error-prone, time-consuming and costly to obtain.
Differentiable rendering [11–13] on the other hand is a emerging technique, which allows to generate synthetic,
photo-realistic images based on photographs of real-world objects by estimating its 3D shape and material properties.
While this approach can be used to generate photo-realistic images, it can also be applied for the generation of
respective ground truth labels for segmentation and object detection masks. Combining differentiable rendering with
deep learning could potentially solve the data bottleneck for machine learning algorithms in various fields, including
materials science and biomedical engineering.
The work of this thesis is based on the differentiable rendering framework ‘Redner’ [11] using data from the Cell
Tracking Challenge [14, 15]. In a first step, a literature research will be conducted on the topic of differentiable rendering.
In a second step, an existing implementation for the light, shader and geometry estimation of nanoparticles
will be adapted for the semi-supervised segmentation of GFP-GOWT1 mouse stem cells. Afterwards, the results of
this approach will be evaluated in terms of segmentation accuracy.
The thesis will include the following points:
• Getting familiar with the concepts of Differentiable Rendering and Gradient-based learning methods
• Implementation of a proof-of-concept for the semi-supervised segmentation of cells based on the ‘Redner’
framework using existing data from the Cell Tracking Challenge
• Evaluation of the method in terms of segmentation accuracy
• Elaboration of potential improvements for the method
Academic advisors:

References
[1] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
[2] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” Proceedings of the IEEE International Conference
on Computer Vision, vol. 2017-Octob, pp. 2980–2988, 2017.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv
preprint arXiv:1409.1556, 2014.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 770–778, 2016.
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,”
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-
Decem, pp. 779–788, 2016.
[6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”
in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241,
Springer, 2015.
[7] T. Falk, D. Mai, R. Bensch, O¨ . C¸ ic¸ek, A. Abdulkadir, Y. Marrakchi, A. Bo¨hm, J. Deubner, Z. Ja¨ckel, K. Seiwald,
A. Dovzhenko, O. Tietz, C. Dal Bosco, S. Walsh, D. Saltukoglu, T. L. Tay, M. Prinz, K. Palme, M. Simons,
I. Diester, T. Brox, and O. Ronneberger, “U-Net: deep learning for cell counting, detection, and morphometry,”
Nature Methods, vol. 16, no. 1, pp. 67–70, 2019.
[8] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image
database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, no. May 2014, pp. 248–255,
2009.
[9] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou,
V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap,
M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks
and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”
in Advances in neural information processing systems, pp. 1097–1105, 2012.
[11] T.-M. Li, M. Aittala, F. Durand, and J. Lehtinen, “Differentiable monte carlo ray tracing through edge sampling,”
ACM Trans. Graph., vol. 37, Dec. 2018.
[12] M. Nimier-David, D. Vicini, T. Zeltner, and W. Jakob, “Mitsuba 2: a retargetable forward and inverse renderer,”
ACM Transactions on Graphics (TOG), vol. 38, no. 6, p. 203, 2019.
[13] G. Loubet, N. Holzschuch, andW. Jakob, “Reparameterizing discontinuous integrands for differentiable rendering,”
ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–14, 2019.
[14] M. Maˇska, V. Ulman, D. Svoboda, P. Matula, P. Matula, C. Ederra, A. Urbiola, T. Espa˜na, S. Venkatesan,
D. M. Balak, et al., “A benchmark for comparison of cell tracking algorithms,” Bioinformatics, vol. 30, no. 11,
pp. 1609–1617, 2014.
[15] V. Ulman, M. Maˇska, K. E. Magnusson, O. Ronneberger, C. Haubold, N. Harder, P. Matula, P. Matula, D. Svoboda,
M. Radojevic, et al., “An objective comparison of cell-tracking algorithms,” Nature methods, vol. 14,
no. 12, p. 1141, 2017.

Detecting Defects on Transparent Objects using Polarization Cameras

The classification of images is a well known task in computer vision. However, transparent or semi-
transparent objects have several properties that can make computer vision tasks harder. Those objects
usually have less textures and sometimes strong reflections. Occasionally, different backgrounds make
it hard to recognize edges or the shape of an object. [1, 2]
To overcome these difficulties we use polarization cameras in this work. In contrast to ordinary cameras,
polarization cameras additionally record information about the polarization of the light rays. Most
natural light sources emit unpolarized light. By using a light source that emits polarized light, it is
possible to remove reflections or increase the contrast. Further it is known that the Angle of Linear
Polarization (AoLP) provides information about the normal of a surface [3].
In this work, we will use the Deep Learning approach and use Convolutional Neural Networks (CNNs)
to explore the following topics:
1. Comparison of different sorts of preprocessing:
• Using only raw data / reshaped raw data
• Using extra features Degree of Linear Polarization (DoLP) and AoLP
2. Influence of different light sources.
3. Comparison of different defect classes.
To evaluate the results we use different error metrics such as accuracy and f1, as well as gradient class
activation maps (GradCAM) [4].
The implementation should be done in Python.

References
[1] Agastya Kalra, Vage Taamazyan, Supreeth Krishna Rao, Kartik Venkataraman, Ramesh Raskar, and Achuta
Kadambi. Deep Polarization Cues for Transparent Object Segmentation. In 2020 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), pages 8599–8608, Seattle, WA, USA, June 2020. IEEE.
[2] Ilya Lysenkov, Victor Eruhimov, and Gary Bradski. Recognition and Pose Estimation of Rigid Transparent
Objects with a Kinect Sensor. page 8, 2013.
[3] Francelino Freitas Carvalho, Carlos Augusto de Moraes Cruz, Greicy Costa Marques, and Kayque Martins
Cruz Damasceno. Angular Light, Polarization and Stokes Parameters Information in a Hybrid Image Sensor
with Division of Focal Plane. Sensors, 20(12):3391, June 2020.[4] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and
Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.
International Journal of Computer Vision, 128(2):336–359, February 2020.

Diffeomorphic MRI Image Registration using Deep Learning

State-of-the-art deformable image registration approaches achieve impressive results and are commonly used in diverse image processing applications. However, these approaches are computationally expensive even on GPUs [1] due to their requirement to solve an optimization problem for each image pair during registration [2]. Most Learning based methods either required labeled data or do not guarantee a diffeomorphic registration or deformation field reversibility [1]. Adrian V. Dalca et. al. presented an unsupervised Deep-Learning framework for diffeomorphic image registration named Voxelmorph in [1].
In this thesis the network described in [1] will be implemented and trained on Cardiac Magnetic Resonance images to build an application for fast diffeomorphic image registration. The results will be compared to state-of-the-art diffeomorphic image registration methods. Additionally the method will be evaluated by comparing segmented areas as well as landmark locations of co-registered images. Furthermore the method in [1] will be extended to a one-to-many registration method using the approach in [3] to fulfill the desire for motion estimation of anatomy of interest for increasingly available dynamic imaging data [3]. Data used in this thesis will be provided by Siemens Healthineers. The implementation will be done using a open source framework like PyTorch [4].

The thesis will include the following points:

• Literature research of the topic of state-of-the-art methods regarding diffeomorphic image registration and one to many registration

• Implementing a Neural Network for diffeomorphic image regstration and extending it to a one-to-many registration

• Comparison of the results with state-of-the-art image registration methods

[1] Balakrishnan, G., Zhao, A., Sabuncu, M. R., Guttag, J. V. & Dalca, A. V. VoxelMorph: A Learning Framework for
Deformable Medical Image Registration. CoRR abs/1809.05231. arXiv: 1809.05231. http://arxiv.org/abs/1809.05231 (2018).
[2] Ashburner, J. A fast diffeomorphic image registration algorithm. NeuroImage 38, 95 –113. ISSN: 1053-8119. http://www.sciencedirect.com/science/article/pii/S1053811907005848 (2007).
[3] Metz, C., Klein, S., Schaap, M., van Walsum, T. & Niessen, W. Nonrigid registration of dynamic medical imaging data using nD+t B-splines and a groupwise optimization approach. Medical Image Analysis 15, 238 –249. ISSN: 1361-8415. http://www.sciencedirect.com/science/article/pii/S1361841510001155 (2011).
[4] Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. CoRR abs/1912.01703. arXiv: 1912.01703. http://arxiv.org/abs/1912.01703 (2019).