Index
Synthetic Image Rendering for Deep Learning License Plate Recognition
The recognition of license plates is usually considered a rather simple task, that a human
is perfectly capable of. However, there exist many factors (e.g. fog, rain), that can
signicantly worsen the image quality and therefore increase the diculty of recognizing
a license plate. In addition, further factors e.g. low resolution or small size of the license
plate section may increase the diculty up to a point, where even humans are unable to
identify it.
A possible approach to solve this problem is to build and train a neural network using
collected image data. In theory, this should yield a high success rate and outperform a
human. However, a huge number of images, that also fulll certain criteria, is needed in
order to reliably recognize plates in dierent situations.
That is the reason why this thesis aims at building and training a neural network, that is
based on an existing CNN [1], for recognizing license plates using training data, which is
articially created. This ensures enough images are provided, while facilitating the possibility
of adding image eects to simulate many possible situations. The needed images
can be created using Blender: It oers the option to create a 3D model of a license plate,
as well as options to simulate certain weather conditions like fog or rain, while also providing
an API to automate the creation process. This way, nearly all cases can be covered
and the described procedure maximizes the success rate of the license plate detection.
The thesis consists of the following steps:
Creating a training data set consisting of generated license plate images (Blender
Python API)
Fitting the parameters of the Deep Learning model
Evaluation of the model t on datasets with real license plate images
Literatur
[1] Benedikt Lorch, Shruti Agarwal, and Hany Farid. Forensic Reconstruction of Severely
Degraded License Plates. In Society for Imaging Science & Technology, editor,
Electronic Imaging, Jan 2019.
Learning projection matrices for marker free motion compensation in weight-bearing CT scans
The integration of known operators into neural networks has recently received more and
more attention. The theoretical proof of its benets has been described by Maier and Syben
et al. in [1, 2]. Reducing the number of trainable weights by replacing trainable layers with
known operators reduces the overall approximation error and makes it easier to interpret
the layers function. This is of special interest in the context of medical imaging, where it is
crucial to understand the eects of layers or operators on the resulting image. Several use
cases of know operators in medical imaging have been explored in the past few years [3][4][5].
An API to make such experiments easier is the PYRO-NN API by Syben et al. which comes
with several forward and backward projectors for dierent geometries as well as with helpers
such as lters [6].
Cone Beam CT (CBCT) imaging is a widely used X-Ray imaging technology which uses
a point source of X-rays and a 2D at panel detector. Using an reconstruction algorithm
such as the FDK algorithm, a complete 3D reconstruction can be estimated using just one
rotation around the patient [7]. This modality is of great use in orthopedics were so called
weight bearing CT scans image primarily knee joints underweight bearing conditions to
picture the cartilage tissue under stress. The main drawback of this modality are motion
artifacts caused by involuntary movement of the patients knee and inaccuracies in the trajectory
of the scanner. In order to correct those artifacts, the extrinsic camera parameters,
which describe the position and orientation of the object relative to the detector have to be
adjusted [8].
To get one step closer to reduce motion artifacts without additional cameras or markers, it is
of special interest to study the feasibility of training extrinsic camera parameters as part of
a reconstruction pipeline. Before we can assess an algorithm to estimate those parameters,
the general feasibility of training the extrinsic camera parameters of a projection matrix
will be studied. The patients motion will be estimated iterative using a adapted gradient
descent algorithms, known from the training of neural networks.
The Bachelor’s thesis covers the following aspects:
1. Discussing of the general idea of motion compensation in CBCT as well as an quick
overview of the PYRO-NN API and thus into known Operators in general.
2. Study feasibility to learn a projection matrix of a single forward projection:
.Assessing the ability to train single parameters
Training of translations and rotations
Attempt estimate the complete rigid motion parameters
3. Training of a simple trajectory:
Assessing the motion estimation of the back projection using the volume as
ground truth
Assessing the motion estimation using a undistorted sinogram
Estimate the trajectory only based on the distorted sinogram
4. Evaluation of the training results of the experiments and description of potential applications
of the results.
All implementations will be integrated into the PYRO-NN API [6].
References
[1] A. Maier, F. Schebesch, C. Syben, T. Wur , S. Steidl, J. Choi, and R. Fahrig, \Precision
learning: Towards use of known operators in neural networks,” in 2018 24th International
Conference on Pattern Recognition (ICPR), pp. 183{188, 2018.
[2] A. K. Maier, C. Syben, B. Stimpel, T.Wur, M. Homann, F. Schebesch, W. Fu, L. Mill,
L. Kling, and S. Christiansen, \Learning with known operators reduces maximum error
bounds,” Nature machine intelligence, vol. 1, no. 8, pp. 373{380, 2019.
[3] W. Fu, K. Breininger, R. Schaert, N. Ravikumar, T. Wur , J. G. Fujimoto, E. M.
Moult, and A. Maier, \Frangi-Net: A Neural Network Approach to Vessel Segmentation,”
in BildVerarbeitung fur die Medizin (BVM) 2018 (H. H. K. H. M.-H. C. P. T. T.
Andreas Maier, Thomas M. Deserno, ed.), (Berlin, Heidelberg), pp. 341{346, Springer
Vieweg, Berlin, Heidelberg, 2018.
[4] C. Syben, B. Stimpel, K. Breininger, T. Wur , R. Fahrig, A. Dorer, and A. Maier,
\Precision Learning: Reconstruction Filter Kernel Discretization,” in Proceedings of
the 5th International Conference on Image Formation in X-ray Computed Tomography,
pp. 386{390, 2018. UnivIS-Import:2018-09-11:Pub.2018.tech.IMMD.IMMD5.precis 0.
[5] T. Wur , F. C. Ghesu, V. Christlein, and A. Maier, \Deep learning computed tomography,”
in Medical Image Computing and Computer-Assisted Intervention – MICCAI
2016 (S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells, eds.), (Cham),
pp. 432{440, Springer International Publishing, 2016.
[6] C. Syben, M. Michen, B. Stimpel, S. Seitz, S. Ploner, and A. K. Maier, \Technical note:
Pyro-nn: Python reconstruction operators in neural networks,” Medical Physics, 2019.
[7] L. Feldkamp, L. C. Davis, and J. Kress, \Practical cone-beam algorithm,” J. Opt. Soc.
Am, vol. 1, pp. 612{619, 01 1984.
[8] J. Maier, M. Nitschke, J.-H. Choi, G. Gold, R. Fahrig, B. M. Eskoer, and A. Maier,
\Inertial measurements for motion compensation in weight-bearing cone-beam ct of the
knee,” 2020.
Clustering of HPC jobs using Unsupervised Machine Learning on job performance metric time series data
Deep Learning-based Matching of Chest X-Ray Scans
The use of human identification has become an increasingly important factor over the past years, with
facial recognition being potentially the most common form used in daily life. But the face is not the
only biometric identifier that can be used as a feature for identification. In this work, we will investigate
chest X-rays as biometric identifiers. If they were proven to be viable, it would for example allow
identification post mortem, where common techniques currently have shortcomings [1]. Also, a success
in such a way of identification may have far-reaching consequences and implications concerning data
protection and anonymity in the medical field.
In pattern recognition, the use of deep learning has proven to be successful in improving or even
replacing classical methods entirely. To test the limits of what is currently possible, a neural network
will be created that takes in two different x-ray scans as inputs and outputs a score measuring their
similarity.
To increase the chances of success, a registration step will be incorporated in the preprocessing step. It
will be be implemented as a neural network layer, as this has proven to be effective in the past [2].
The thesis consists of the following milestones:
• Testing out the capabilities of different network architectures concerning the task of finding
matches in chest X-Ray scans
• Further enhancing the functionality by incorporating a layer into the network that is capable of
affine registrations, e. g. by means of a spatial transformer network [3]
The implementation should be done in Python.
References
[1] Ryudo Ishigami, Thi Thi Zin, Norihiro Shinkawa, and Ryuichi Nishii. Human identification using x-ray
image matching. In Proceedings of The International MultiConference of Engineers and Computer Scientists
2017, volume 1, pages 415–418, 2017.
[2] Grant Haskins, Uwe Kruger, and Pingkun Yan. Deep learning in medical image registration: a survey.
Machine Vision and Applications, 31(1–2), Jan 2020.
[3] Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks.
In Advances in Neural Information Processing Systems 28, pages 2017–2025. Curran Associates, Inc., 2015.
Start, follow, read, stop: Incorporating new steps into end-to-end full-page handwriting recognition method
In this work, new steps are incorporated into a known offline recognition method [1] as an attempt to
improve the transcription of degraded and poor-quality historical documents. The previously proposed
model consists of three components:
1. Start-of-line (SOL)
This network predicts the starting points of lines, together with an indication of the size and
direction of the handwriting.
2. Line-follower (LF)
Given a starting point, the LF network follows the handwriting line in incremental steps and
outputs a dewarped line image that is suitable for text recognition purposes.
3. Handwriting recognition (HWR)
After having the LF network produce several normalized line images, these can then be fed to a
CNN-LSTM HWR network [2] to produce transcriptions of the detected lines.
The method performed well on warped lines and has the advantage of outputting polygonal regions
instead of bounding boxes [3], but it still has several shortcomings, specially when considering
documents where unrelated pieces of information are frequently horizontally adjacent to one another.
It cannot detect and adapt to changes in handwriting size either, relying solely on the initial prediction
made by the SOL network to extract lines.
Modifications are to be made to the network architecture of the model in order to address these
shortcomings, and the thesis would then consist of the following milestones:
• Extending the SOL network architecture in order to include End-of-Line (EOL) detection.
• Modifying the LF network architecture to capture variations in handwriting size.
• Applying the LF network backwards from EOL predictions and finding an effective way of
merging both line information.
• Evaluating performance on historical full page datasets.
• Further experiments regarding procedure and network architecture.
The implementation should be done in Python.
References
[1] Davis B. Barrett W. Price B. Cohen S. Wigington C., Tensmeyer C. Start, follow, read: End-to-end full-page
handwriting recognition. Computer Vision – European Conference on Computer Vision 2018 (ECCV) pages
372-388, 2018.
[2] Stewart S. Davis B. Barrett W. Price B. Cohen S. Wigington, C. Data augmentation for recognition of
handwritten words and lines using a cnn-lstm network. 14th International Conference on Document Analysis
and Recognition (ICDAR) pp. 639–645, 2017.
[3] Wolf C. Moysset B., Kermorvant C. Full-page text recognition: Learning where to start and when to stop.
14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017.
Development of a pre-processing/simulation Framework for Multi-Channel Audio Signals
The goal of this thesis is to develop a framework that simulates multi-channel audio signals in a 2D/3D environment for hearing aids. For this purpose, existing head related transfer functions (HRTFs) will be used to simulate direction and hearing aid microphone characteristics. Furthermore, source movement as well as microphone movement and rotation will be implemented. The latter is mandatory for hearing aids, since especially head rotation might change the relative direction of the different sources significantly. The framework will be able to simulate multiple speakers as well as multiple noise sources. To calculate a clean speech target, a provided reference beamformer will be used on the target speech only, neglecting noise and non target speakers. Optionally, an opening angle that defines the target directions can be used to extract the clean speech targets. As a second optional aspect, room environment including absorption and reverberation will be simulated. Therefore, a reference implementation can be used.
Semantic Segmentation of the Human Eye for Driver Monitoring
Extension of the Lottery Ticket Hypothesis for Saving Computational Cost and Energy
Many state-of-the-art neural networks have millions of parameters, e.g. VGG’s smallest configuration has 133 million parameters[1]. They achieve high test and training accuracies but require a high computational cost. Since powerful hardware exists, coping with that computational cost is possible but very inefficient. To countereffect these inefficiencies network pruning can be applied to decrease the size of the neural networks. Because the resulting accuracies after pruning did not match the ones of the original network for many years but in contrast hardware got more performing and cheaper, pruning was thought not to be optimal and the trend of constructing neural networks went towards creating big networks that perform consistently well on high performing hardware.
In 2019 The Lottery Ticket Hypothesis LTH [2] was introduced as a new approach for pruning neural networks. Using a binary mask the lowest weights are selectively set to zero and therefore connections are removed from the network. It suggests that fully-connected and convolutional neural networks can be iteratively pruned into a sparse subnetwork such that the parameter count can be reduced by over 90% while the number of iterations in training is at max as high as the original network’s and the test accuracy meets or even exceeds the original one. This paper opened up a large area of discussion where on the one hand some papers do not find improvements of the LTH over random initialization [3], whereas on the other hand some even found insights to why the approach works well [4].
A drawback of the lottery ticket hypothesis is however that the network’s structure of neurons still stays the same and no neurons are removed to decrease computational cost. The goal of this thesis is to investigate whether neural network pruning by reducing the number of neurons based on the idea of the lottery ticket hypothesis is possible. Additional goals would be to compute the amount of energy savings [5], compare masks and structures created by different datasets and optimizers for the same network to acquire potentially deeper insights and see if the idea of a supermask [4] also exists for the approach of neuron pruning.
The thesis aims to achieve following goals:
• Extending the lottery ticket hypothesis by actually removing neurons, instead of using a binary mask.
• Comparing the accuracies and network size to the original thesis on different datasets.
Additional investigations should be taken in:
• Comparing the network structures and masks of different datasets and optimizers.
• Compute the amount of energy savings. [5]
• Incorporate the idea of a supermask [4] to the approach of neuron pruning.
• Investigate pruning procedures aiming at removing enurons,instead of cutting connections.
[1] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2015.
[2] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Training pruned neural networks. CoRR, abs/1803.03635, 2018.
[3] Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. CoRR, abs/1810.05270, 2018.
[4] Hattie Zhou, Janice Lan, Rosanne Liu, and Jason Yosinski. Deconstructing lottery tickets: Zeros, signs, and the supermask. CoRR, abs/1905.01067, 2019.
[5] Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. CoRR, abs/1611.05128, 2016.
Multi-task Learning for Historical Handwritten Document Classification
In the Competition on Image Retrieval for Historical Handwritten Documents 2019 [1], several methods
have been proposed to identify the writer of a document. Although most of the proposed methods
are based on feature descriptors and traditional machine learning techniques, the deep learning based
methods are emerging in the field of historical document classification. At the same time, other deep
learning based methods are dominating the image retrieval and classification task.
Multi-task learning (MTL) is an approach to learning multiple tasks at the same time using one neural
network. This approach has not been used for historical handwritten document classification. Over
the years, MTL has been applied to many fields, not only for computer vision, but also to speech
processing, bioinformatics, etc. , to boost the performance [2]. In context of computer vision, MTL is
used to detect facial landmarks to improve the performance of expression recognition [3]. Furthermore,
a convolutional neuronal network has been proposed for pose estimation with some auxiliary tasks: for
example body part detection [4].
In this work, we will investigate the approach with neural networks using multi-task learning for
historical handwritten document classification. We will use the online published datasets from previous
competitions for training and testing.
We will implement two multi-task neural networks, one should focus on writer identification with the
auxiliary task: binarization. The performance of this multi-task learning algorithm will be evaluated
using the datasets from the ICDAR2017 Competition on Historical Document Writer Identification
(Historical-WI) [5]. The other neural network should focus on dating and style classification and
the performance of this multi-task learning algorithm will be evaluated using the datasets from the
ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script [6].
The thesis consists of the following milestones:
• Review of the related work and methods of historical handwritten document classification
• Implemention of two neural networks for multi-task learning for:
– writer identification and binarization
– date classification and script type classification
• Evaluation the results of these two multi-task neural networks
• Comparison of this approach to current document classification approaches
• Examination and discussion about whether a multi-task neural network is useful for document
classification.
The implementation should be done in Pytorch.
[1] Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, and Andreas Maier. Icdar
2019 competition on image retrieval for historical handwritten documents. In International Conference on
Document Analysis and Recognition (ICDAR), 2019.
[2] Yu Zhang and Qiang Yang. A survey on multi-task learning. arXiv preprint arXiv:1707.08114, 2017.
[3] Terrance Devries, Kumar Biswaranjan, and Graham W. Taylor. Multi-task learning of facial landmarks and
expression. In 2014 Canadian Conference on Computer and Robot Vision, pages 98–103. IEEE, 2014.
[4] Sijin Li, Zhi-Qiang Liu, and Antoni B. Chan. Heterogeneous multi-task learning for human pose estimation
with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pages 482–489, 2014.
[5] Stefan Fiel, Florian Kleber, Markus Diem, Vincent Christlein, Georgios Louloudis, Stamatopoulos Nikos,
and Basilis Gatos. Icdar2017 competition on historical document writer identification (historical-wi). In
2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1,
pages 1377–1382. IEEE, 2017.
[6] Florence Cloppet, Veronique Eglin, Marlene Helias-Baron, Cuong Kieu, Nicole Vincent, and Dominique
Stutzmann. Icdar2017 competition on the classification of medieval handwritings in latin script. In 2017
14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1, pages
1371–1376. IEEE, 2017.