Navigation

Index

Character Height Estimation in Historical Document Images

During past decades, the field of Document Image Analysis and Recognition (DIAR) has been the subject of many researches due to its wide range of applications. DIAR can be applied to either printed or handwritten, textual or graphical document images with the purpose of automatically analyzing their contents in order to retrieve useful information [1, 2]. The applications of DIAR arise in different fields such as the storage and indexing of cultural heritage by analyzing historical manuscripts. Text detection and recognition in imagery are two key components of most techniques in DIAR [3, 4]. Since the existing methods for text detection rely on texture estimation [5] or edge detection [6] as stated by Wolf et al. [7], the text characteristics may affect the document analysis. For this reason, text recognition pipelines typically resize text lines to a specific height which is the one they were trained
on.
In this thesis, the influence of the text height on document analysis is investigated. Document resizing
to a specific text height will be inserted as first step of several DIAR methods for running experiments. The thesis consists of the following milestones:
• Producing a data set with text height labeled for a sufficient amount of ancient books and
manuscripts [8, 9].
• Developing a system which detects text in the documents and resizes it to a predetermined height
in pixels.
• Running various experiments to determine whether this improves the results of different DIAR
methods.

[1] Deepika Ghai and Neelu Jain. Text extraction from document images-a review. International Journal of Computer Applications, 84(3), 2013.
[2] Vikas Yadav and Nicolas Ragot. Text extraction in document images: Highlight on using corner points. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pages 281–286, 2016.
[3] Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5551–5560, 2017.
[4] Adam Coates, Blake Carpenter, Carl Case, Sanjeev Satheesh, Bipin Suresh, Tao Wang, David J Wu, and Andrew Y Ng. Text detection and character recognition in scene images with unsupervised feature learning. In 2011 International Conference on Document Analysis and Recognition, pages 440–445. IEEE, 2011.
[5] Bangalore S Manjunath and Wei-Ying Ma. Texture features for browsing and retrieval of image data. IEEE Transactions on pattern analysis and machine intelligence, 18(8):837–842, 1996.
[6] Chung-Ching Chen et al. Fast boundary detection: A generalization and a new algorithm. IEEE Transactions on computers, 100(10):988–998, 1977.
[7] Christian Wolf, Jean-Michel Jolion, and LIRIS INSA de Lyon. Model based text detection in images and videos: a learning approach. Laboratoire dInfoRmatique en Images et Systemes dinformation, Palmas, TO, 2004.
[8] Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, and Andreas Maier. Icdar 2019 competition on image retrieval for historical handwritten documents. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 1505–1509. IEEE, 2019.
[9] https://lme.tf.fau.de/competitions/icdar-2021-competition-on-historical-document-classification.

Investigating the class-imbalance problem using deep learning techniques on real industry printed circuit board data

Investigating the class-imbalance problem using deep learning
techniques on real industry printed circuit board data
Quality control tasks in industry provide the ideal environment for the application of machine learning
due to large volumes of machine-generated data. However, some of the collected data is heavily
unbalanced or even unlabelled, since labelling the data is very labour- and cost-intensive. The objective
of this work is to investigate and apply deep learning methods to address this problem. For this
purpose, a real industry printed circuit board data set will be utilized, which is provided by Continental
corporation.
The first part of this work is a literature review in order to investigate available methods to overcome the
mentioned class-imbalance problem. The emphasis of this review is set to three different subsections:
The first group of methods will deal with synthetic over-sampling, which is a mechanism to enforce
the generation of data points in the convex hull of the intended underrepresented classes. A concrete
application to achieve this would be the Polarity-GAN approach proposed in [1]. In order to make use
of unlabelled data, semi-supervised learning approaches are examined next. The starting point for this
will be the work of Hyun et al. [2], who looked into available semi-supervised deep learning methods
for class-imbalances. The last subsection will deal with classical deep learning methods addressing the
class-imbalance problem, such as the Focal Loss [3].
After a detailed review, the investigated methods will be implemented and applied to the real industry
use case. For this purpose, the data pre-processing and sampling will be fixed to ensure reproducibility
across all experiments. Furthermore, the baseline against which all experiment results are compared
will be a ResNet50 architecture [4]. With a fixed framing and baseline, the performance of all acquired
methods will be evaluated using the real industry data. In addition, a possibility will be sought to
combine the various methods in such a way that the classification performance will become more
robust.
The thesis consists of the following milestones:
• Literature review to acquire possible methods regarding the class-imbalance problem
• Implement fixed machine learning pipeline to ensure reproducible experiments
• Apply found methods to the fixed framing and evaluate performance
• Evaluate performance against a ResNet50
The implementation will be done in Python with the help of PyTorch [5].

References
[1] Kumari Deepshikha and Anugunj Naman. Removing class imbalance using polarity-gan: An uncertainty
sampling approach, 2020.
[2] Minsung Hyun, Jisoo Jeong, and Nojun Kwak. Class-imbalanced semi-supervised learning. CoRR,
abs/2002.06815, 2020.
[3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object
detection, 2018.
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition,
2015.
[5] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen,
Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary
DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach,
H. Larochelle, A. Beygelzimer, F. d’Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural
Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.

Digitization of Handwritten Rey Osterrieth Complex Figure Test Score Sheets

The Rey Osterrieth Complex Figure Test (ROCF) is a neuropsychological test to detect cognitive
impairments.
As the scoring is mostly implemented by hand from experts the goal is to automate the ROCF by
means of machine learning.
The whole project consists of four milestones:
1. State-of-the-art literature research
2. Development of an OCR-based algorithm to digitize the handwritten score sheet into machine
readable structured format for training an automatic algorithm
3. Development of a deep learning algorithm for automatic scoring ROFCs based on the 36-point
scoring system
4. Evaluation of the algorithm based on the data and publication of the results
This thesis will mainly examine the first two steps.
The used scoring sheets consist of an identical structure and just the score itself is handwritten.
Therefore only digits have to be recognized.
The idea is to use networks already trained on the MNIST database (e.g. [1], [2], [3]) and to gain the
best outcome performance for the described issue.
Therefore some preprocessing of the scanned scoring sheets such as detecting areas of interest, binari-
zation or rotation will be necessary to match the requirements for input data of the specific algorithms
as well as for improving performance.
Other options for preprocessing could be template matching or taking advantage of the HU-moments
[4]. Hereby text detection, i.e. finding areas of interests, is one of the typically performed steps in any
text processing pipeline [5].
Furthermore modifying algorithms and weights will be used to achieve different outcomes which than
can be compared in relation to their performances.
The implementation should be done in Python.

References
[1] Gargi Jha. Mnist handwritten digit recognition using neural network, Sep 2020.
[2] Muhammad Ardi. Simple neural network on mnist handwritten digit dataset, Sep 2020.
[3] Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. Deep big simple
neural nets excel on handwritten digit recognition. CoRR, abs/1003.0358, 2010.
[4] Zengshi Chen, Emmanuel Lopez-Neri, Snezana Zekovich, and Milan Tuba. Hu moments based handwritten
digits recognition algorithm. In Recent advances in knowledge engineering and systems science: Proceedings
of the 12TH international conference on artificial intelligence, knowledge engineering and data bases, page
98–104. WSEAS Press, 2013.
[5] Simon Hofmann, Martin Gropp, David Bernecker, Christopher Pollin, Andreas Maier, and Vincent Christlein.
Vesselness for text detection in historical document images. In 2016 IEEE International Conference on
Image Processing (ICIP), pages 3259–3263, 2016.

Glioma growth prediction via machine learning

Fully Automated Classification of Anatomical Variants of the Coronary Arteries from Cardiac Computed Tomography Angiography

Radiomics, Delta- and Dose-Radiomics in brain metastases

Writer Identification using Transformer-based Deep Neural Networks

Writer identification is an application of biometric identification by handwriting. In conventional
machine-learning-based methods, hand-crafted features are extracted to compute a global embedding
of the handwriting images [1]. State-of-the-art deep learning techniques have also shown comparable
performance in writer identification tasks [2][3]. The automatic features are extracted by convolutional
layers. A deep-learning-based writer identification method for historical document follows the follo-
wing pipeline: Firstly, the locations containing the handwritings of a historical document are chosen.
Then, the network extracts local feature descriptors. Afterwards, normalized local descriptors are
encoded and aggregated into a global descriptor. Finally, the similarity of each two global descriptors
is computed. However, using CNN-based methods only the parts that are extracted by the filters are
kept, while CNN fails to encode the spatial relations between these learned features.
Recently, Dosovitskiy et al. [4] proposed the Vision Transformer (ViT) to classify images. Unlike
CNNs, ViT introduces self-attention and learns the relations of pixels and feeds the entire information
of an image to the model. In Dosovitskiy et al.’s study, ViT attained superior performance with fewer
computational resources in training when compared to state-of-the-art CNNs.
In this work, a new approach based on Transformer will be deployed to create the global descriptor of
a historical handwriting document and identify the authorship. We will compare the new approach
with some state-of-the-art methods to investigate the effect of introducing Transformer into writer
identification tasks.
The thesis consists of the following milestones:
• Creating a global embedding of the document image and identifying writers using a Transformer-
based neural network.
• Evaluating performance on the ICDAR17 competition dataset on historical document writer
identification.
• Evaluating different loss functions.
• Comparing with the Vector of Linearly Aggregated Descriptors (VLAD) encoding.
• Experimenting with other network architectures, comparing training speed and performance.
The implementation should be done in Python.

Deep learning-based respiratory navigation for abdominal MRI

In Magnetic Resonance Imaging (MRI) of the abdomen, breathing motion
and cardiac motion are the main confounding factors introducing
artifacts and causing diminished image quality. Different strategies to
minimize the susceptibility to (breathing) motion-related artifacts have
been developed over the last decades, the most routinely used ones being
breath-held acquisitions, retrospective gating, and prospective
triggering. Breath-held techniques are sampling efficient but may not be
applicable in seriously ill patients and pediatric patients. In
addition, MRI techniques such as 3D high-resolution Magnetic Resonance
Cholangiopancreatography (MRCP) require parameter sets making it extremely
difficult to perform the exam in a single breath-hold or multi
breath-hold fashion. Under the assumption that breathing patterns are
stable and regular, triggered acquisition schemes aim to acquire data in
certain states/sectors of the breathing cycle, typically during the
relatively stable end-expiratory phase. This technique is less sampling
efficient and has additional challenges in irregular breathers.
The aim of this thesis is to analyze existing techniques for breathing
trigger point detection and breathing pattern analysis and to explore if
neural networks are suitable to derive optimal trigger points, adapt
triggering schemes to changing patient conditions, and investigate
whether breathing irregularities can be predicted from previous
breathing cycles.

AI-based classification of diffuse liver disease

Research Project: Entwicklung von Prozessabläufen für die Forschungszusammenarbeit in datengetriebenen und institutionsübergreifenden Forschungsprojekten

Der vorliegende Artikel adressiert die Entwicklung von Prozessabläufen in institutionsübergreifenden und datengetriebenen Forschungsprojekten. Hierbei wird die Frage behandelt, ob eine Standardisierung der Prozessketten möglich ist, inwiefern Governance-Strukturen der Medizininformatikinitiative (MII) für datengetriebene Forschungsvorhaben inkludiert werden können und abschließend, ob somit die Handlungssicherheit für Beteiligte erhöht werden kann. Hierfür wurden Ist-Abläufe innerhalb durchgeführter Kooperationen mit empfohlenen Standards der MII abgeglichen und mithilfe des Knowhows beteiligter Mitarbeiter in Prozessketten überführt. Es konnten so Prozessabläufe entwickelt werden, die durch kaskadierende Prozessketten, Erläuterungen und Checklisten eine standardisierte Handreichung für Kooperationsprojekte bilden. Ebenfalls können durch die Dokumente zukünftig Fehler innerhalb der einzelnen Prozesselemente vermieden werden und Kooperationsprojekte einfacher, zielorientierte und übersichtlicher durchgeführt werden.