Index
Character Height Estimation in Historical Document Images
During past decades, the field of Document Image Analysis and Recognition (DIAR) has been the subject of many researches due to its wide range of applications. DIAR can be applied to either printed or handwritten, textual or graphical document images with the purpose of automatically analyzing their contents in order to retrieve useful information [1, 2]. The applications of DIAR arise in different fields such as the storage and indexing of cultural heritage by analyzing historical manuscripts. Text detection and recognition in imagery are two key components of most techniques in DIAR [3, 4]. Since the existing methods for text detection rely on texture estimation [5] or edge detection [6] as stated by Wolf et al. [7], the text characteristics may affect the document analysis. For this reason, text recognition pipelines typically resize text lines to a specific height which is the one they were trained
on.
In this thesis, the influence of the text height on document analysis is investigated. Document resizing
to a specific text height will be inserted as first step of several DIAR methods for running experiments. The thesis consists of the following milestones:
• Producing a data set with text height labeled for a sufficient amount of ancient books and
manuscripts [8, 9].
• Developing a system which detects text in the documents and resizes it to a predetermined height
in pixels.
• Running various experiments to determine whether this improves the results of different DIAR
methods.
[1] Deepika Ghai and Neelu Jain. Text extraction from document images-a review. International Journal of Computer Applications, 84(3), 2013.
[2] Vikas Yadav and Nicolas Ragot. Text extraction in document images: Highlight on using corner points. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pages 281–286, 2016.
[3] Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5551–5560, 2017.
[4] Adam Coates, Blake Carpenter, Carl Case, Sanjeev Satheesh, Bipin Suresh, Tao Wang, David J Wu, and Andrew Y Ng. Text detection and character recognition in scene images with unsupervised feature learning. In 2011 International Conference on Document Analysis and Recognition, pages 440–445. IEEE, 2011.
[5] Bangalore S Manjunath and Wei-Ying Ma. Texture features for browsing and retrieval of image data. IEEE Transactions on pattern analysis and machine intelligence, 18(8):837–842, 1996.
[6] Chung-Ching Chen et al. Fast boundary detection: A generalization and a new algorithm. IEEE Transactions on computers, 100(10):988–998, 1977.
[7] Christian Wolf, Jean-Michel Jolion, and LIRIS INSA de Lyon. Model based text detection in images and videos: a learning approach. Laboratoire dInfoRmatique en Images et Systemes dinformation, Palmas, TO, 2004.
[8] Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, and Andreas Maier. Icdar 2019 competition on image retrieval for historical handwritten documents. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 1505–1509. IEEE, 2019.
[9] https://lme.tf.fau.de/competitions/icdar-2021-competition-on-historical-document-classification.
Investigating the class-imbalance problem using deep learning techniques on real industry printed circuit board data
Investigating the class-imbalance problem using deep learning
techniques on real industry printed circuit board data
Quality control tasks in industry provide the ideal environment for the application of machine learning
due to large volumes of machine-generated data. However, some of the collected data is heavily
unbalanced or even unlabelled, since labelling the data is very labour- and cost-intensive. The objective
of this work is to investigate and apply deep learning methods to address this problem. For this
purpose, a real industry printed circuit board data set will be utilized, which is provided by Continental
corporation.
The first part of this work is a literature review in order to investigate available methods to overcome the
mentioned class-imbalance problem. The emphasis of this review is set to three different subsections:
The first group of methods will deal with synthetic over-sampling, which is a mechanism to enforce
the generation of data points in the convex hull of the intended underrepresented classes. A concrete
application to achieve this would be the Polarity-GAN approach proposed in [1]. In order to make use
of unlabelled data, semi-supervised learning approaches are examined next. The starting point for this
will be the work of Hyun et al. [2], who looked into available semi-supervised deep learning methods
for class-imbalances. The last subsection will deal with classical deep learning methods addressing the
class-imbalance problem, such as the Focal Loss [3].
After a detailed review, the investigated methods will be implemented and applied to the real industry
use case. For this purpose, the data pre-processing and sampling will be fixed to ensure reproducibility
across all experiments. Furthermore, the baseline against which all experiment results are compared
will be a ResNet50 architecture [4]. With a fixed framing and baseline, the performance of all acquired
methods will be evaluated using the real industry data. In addition, a possibility will be sought to
combine the various methods in such a way that the classification performance will become more
robust.
The thesis consists of the following milestones:
• Literature review to acquire possible methods regarding the class-imbalance problem
• Implement fixed machine learning pipeline to ensure reproducible experiments
• Apply found methods to the fixed framing and evaluate performance
• Evaluate performance against a ResNet50
The implementation will be done in Python with the help of PyTorch [5].
References
[1] Kumari Deepshikha and Anugunj Naman. Removing class imbalance using polarity-gan: An uncertainty
sampling approach, 2020.
[2] Minsung Hyun, Jisoo Jeong, and Nojun Kwak. Class-imbalanced semi-supervised learning. CoRR,
abs/2002.06815, 2020.
[3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object
detection, 2018.
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition,
2015.
[5] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen,
Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary
DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach,
H. Larochelle, A. Beygelzimer, F. d’Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural
Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
Digitization of Handwritten Rey Osterrieth Complex Figure Test Score Sheets
The Rey Osterrieth Complex Figure Test (ROCF) is a neuropsychological test to detect cognitive
impairments.
As the scoring is mostly implemented by hand from experts the goal is to automate the ROCF by
means of machine learning.
The whole project consists of four milestones:
1. State-of-the-art literature research
2. Development of an OCR-based algorithm to digitize the handwritten score sheet into machine
readable structured format for training an automatic algorithm
3. Development of a deep learning algorithm for automatic scoring ROFCs based on the 36-point
scoring system
4. Evaluation of the algorithm based on the data and publication of the results
This thesis will mainly examine the first two steps.
The used scoring sheets consist of an identical structure and just the score itself is handwritten.
Therefore only digits have to be recognized.
The idea is to use networks already trained on the MNIST database (e.g. [1], [2], [3]) and to gain the
best outcome performance for the described issue.
Therefore some preprocessing of the scanned scoring sheets such as detecting areas of interest, binari-
zation or rotation will be necessary to match the requirements for input data of the specific algorithms
as well as for improving performance.
Other options for preprocessing could be template matching or taking advantage of the HU-moments
[4]. Hereby text detection, i.e. finding areas of interests, is one of the typically performed steps in any
text processing pipeline [5].
Furthermore modifying algorithms and weights will be used to achieve different outcomes which than
can be compared in relation to their performances.
The implementation should be done in Python.
References
[1] Gargi Jha. Mnist handwritten digit recognition using neural network, Sep 2020.
[2] Muhammad Ardi. Simple neural network on mnist handwritten digit dataset, Sep 2020.
[3] Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. Deep big simple
neural nets excel on handwritten digit recognition. CoRR, abs/1003.0358, 2010.
[4] Zengshi Chen, Emmanuel Lopez-Neri, Snezana Zekovich, and Milan Tuba. Hu moments based handwritten
digits recognition algorithm. In Recent advances in knowledge engineering and systems science: Proceedings
of the 12TH international conference on artificial intelligence, knowledge engineering and data bases, page
98–104. WSEAS Press, 2013.
[5] Simon Hofmann, Martin Gropp, David Bernecker, Christopher Pollin, Andreas Maier, and Vincent Christlein.
Vesselness for text detection in historical document images. In 2016 IEEE International Conference on
Image Processing (ICIP), pages 3259–3263, 2016.
Glioma Growth Prediction Using Reaction-Diffusion Modelling and Machine Learning
Fully Automated Classification of Anatomical Variants of the Coronary Arteries from Cardiac Computed Tomography Angiography
Radiomics, Delta- and Dose-Radiomics in brain metastases
Writer Identification using Transformer-based Deep Neural Networks
Deep learning-based respiratory navigation for abdominal MRI
In Magnetic Resonance Imaging (MRI) of the abdomen, breathing motion
and cardiac motion are the main confounding factors introducing
artifacts and causing diminished image quality. Different strategies to
minimize the susceptibility to (breathing) motion-related artifacts have
been developed over the last decades, the most routinely used ones being
breath-held acquisitions, retrospective gating, and prospective
triggering. Breath-held techniques are sampling efficient but may not be
applicable in seriously ill patients and pediatric patients. In
addition, MRI techniques such as 3D high-resolution Magnetic Resonance
Cholangiopancreatography (MRCP) require parameter sets making it extremely
difficult to perform the exam in a single breath-hold or multi
breath-hold fashion. Under the assumption that breathing patterns are
stable and regular, triggered acquisition schemes aim to acquire data in
certain states/sectors of the breathing cycle, typically during the
relatively stable end-expiratory phase. This technique is less sampling
efficient and has additional challenges in irregular breathers.
The aim of this thesis is to analyze existing techniques for breathing
trigger point detection and breathing pattern analysis and to explore if
neural networks are suitable to derive optimal trigger points, adapt
triggering schemes to changing patient conditions, and investigate
whether breathing irregularities can be predicted from previous
breathing cycles.
AI-based classification of diffuse liver disease
Cerebral Vessel Tree Estimation from Non-Contrast CT using Deep Learning Methods
Thesis Description
Globally seen, the WHO (World Health Organization) classifies stroke as the second leading cause of death and the third leading cause of disability [1, 2]. In the United States, on average, every 40 seconds someone has a stroke as statistics from the AHA (American Heart Association) demonstrate. 87% of these strokes are ischemic, the rest is of hemorrhagic nature [3].
The primary first-line neuroimaging technique that is applied in case of a suspected stroke is a non-contrast CT (NCCT) scan. Based on this scan one can differentiate between an ischemic and a hemorrhagic stroke [4]. In case of an acute ischemic stroke a reperfusion can be accomplished either by intravenous thrombolytic drug treatment or with endovascular mechanical thrombectomy. Whereas for thrombolysis the short treatment window and the risk of symptomatic intracranial hemorrhage are limitations, endovascular treatment by using stent retrievers is only securely feasible for large proximal vessel occlusions. Given that, thrombectomy is the preferred method for eligible patients [5, 4]. The decision for or against recanalization by thrombectomy requires the localization of the thrombus on artery-level, which is done using further angiographic imaging [6].
In practice computed tomography angiography (CTA) and magnetic resonance angiography (MRA) are the most important modalities for cerebral angiography. Even though both are in principle suitable for the task, long acquisition time and high operational cost are major drawbacks when it comes to practical application of MRA technique [7]. However, for CTA the patient is exposed to X-rays and intravenous contrast. This contrast agent bears the risk of allergic reactions, contrast-induced nephropathy and thyrotoxicosis [8]. While an angiographic scan is still necessary for thrombus localization, it would be beneficial to also gather as much additional information as possible from the previously acquired NCCT scan. Providing a estimation of the cerebral vessel tree, which is the goal of this thesis, could, for instance, be useful when developing methods for the automatic detection of hyperdense artery signs that can indicate a clot. Such insights can be used to improve decision-making in examination and treatment and serve as a verification for CTA or MRA results. Due to the lack of contrast agent, the vasculature is typically barely visible in NCCT scans, which renders the diagnosis of cerebral ischemia a challenging task [9].
Since the rise of deep learning in medical image analysis, its applications for image segmentation have been a prominent field of research. Especially the U-Net architecture that is designed for fast training on small datasets has gathered huge attention [10]. Previous work addressing cerebral angiography segmentation from NCCT was presented by Klimont et al. [11]. Their approach to generate cerebral angiographies from NCCT scans suffers from several limitations: 1) The segmentation algorithm used to generate the target samples from CTA scans is seen as subpar. 2) The use of 3D-U-Net architecture was not possible due to limited computational resources. Therefore the U-Net is only trained on slices of the scan, which results in a lack of context for the axial dimension. 3) A CycleGAN, which is a deep learning method for image-to-image translation, achieved unsatisfactory results for generating realistic CTA scans. They assume that using an adequate loss function will produce better results [11].
To tackle their first limitation, an already developed, enhanced segmentation algorithm on CTA is applied to provide a better ground truth. Furthermore partitioning the data into patches to enable training with volumetric data on 3D-U-Net is targeted in the first step of this thesis [12]. Then a corresponding CTA should be added as an auxiliary target, which is optimized by extending the previous architecture with a discriminator. This should lead to a more realistic cerebral vessel tree segmentation by the U-Net [13]. A similar architecture has already successfully been applied to a task where synthetic non-contrast images have been generated from CTA scans [14]. Time permitting, a further, optional goal is to train the model to predict separate masks for specific brain vessels (instance segmentation). The evaluation is done on a dataset of 150 patients. For each patient there is a NCCT scan, a CTA scan and a segmentation mask, that is generated out of the given CTA.
The thesis will comprise the following work items:
- Literature research on related work
- Design, implementation and parametrization of the segmentation model
- Cerebral vessel tree segmentation with 3D-U-Net architecture
- Addition of CTA as auxiliary target and extension of U-Net with Discriminator
- Possibly: Multi-class prediction to predict individual vessel masks
- Quantitative evaluation of the implemented system on real-world data (150 samples)
References
[1] Global Health Estimates 2019: Deaths by Cause, Age, Sex, by Country and by Region, 2000-2019. Technical report, World Health Organization, Geneva, 2020.
[2] Global Health Estimates 2019: Disease burden by Cause, Age, Sex, by Country and by Region, 2000-2019. Technical report, World Health Organization, Geneva, 2020.
[3] Salim S. Virani, Alvaro Alonso, Emelia J. Benjamin, Marcio S. Bittencourt, Clifton W. Callaway, April P.
Carson, Alanna M. Chamberlain, Alexander R. Chang, Susan Cheng, Francesca N. Delling, Luc Djousse,
Mitchell S.V. Elkind, Jane F. Ferguson, Myriam Fornage, Sadiya S. Khan, Brett M. Kissela, Kristen L.
Knutson, Tak W. Kwan, Daniel T. Lackland, Ten´e T. Lewis, Judith H. Lichtman, Chris T. Longenecker, Matthew Shane Loop, Pamela L. Lutsey, Seth S. Martin, Kunihiro Matsushita, Andrew E. Moran,
Michael E. Mussolino, Amanda Marma Perak, Wayne D. Rosamond, Gregory A. Roth, Uchechukwu K.A.
Sampson, Gary M. Satou, Emily B. Schroeder, Svati H. Shah, Christina M. Shay, Nicole L. Spartano,
Andrew Stokes, David L. Tirschwell, Lisa B. VanWagner, Connie W. Tsao, and null null. Heart Disease and Stroke Statistics—2020 Update: A Report From the American Heart Association. Circulation,
141(9):e139–e596, March 2020.
[4] C. Zerna, Z. Assis, C. D. d’Esterre, B. K. Menon, and M. Goyal. Imaging, Intervention, and Workflow in
Acute Ischemic Stroke: The Calgary Approach. American Journal of Neuroradiology, 37(6):978–984, June
2016.
[5] Salwa El Tawil and Keith W Muir. Thrombolysis and thrombectomy for acute ischaemic stroke. Clinical
Medicine, 17(2):161–165, April 2017.
[6] Murugan Palaniswami and Bernard Yan. Mechanical Thrombectomy Is Now the Gold Standard for Acute Ischemic Stroke: Implications for Routine Clinical Practice. Interventional Neurology, 4(1-2):18–29, 2015.
[7] D. A. Katz, M. P. Marks, S. A. Napel, P. M. Bracci, and S. L. Roberts. Circle of Willis: evaluation with
spiral CT angiography, MR angiography, and conventional angiography. Radiology, 195(2):445–449, May
1995.
[8] James V. Rawson and Allen L. Pelletier. When to Order a Contrast-Enhanced CT. American Family
Physician, 88(5):312–316, September 2013.
[9] Tom van Seeters, Geert Jan Biessels, Joris M. Niesten, Irene C. van der Schaaf, Jan Willem Dankbaar,
Alexander D. Horsch, Willem P. T. M. Mali, L. Jaap Kappelle, Yolanda van der Graaf, Birgitta K. Velthuis,
and on behalf of the Dust Investigators. Reliability of Visual Assessment of Non-Contrast CT, CT Angiography Source Images and CT Perfusion in Patients with Suspected Ischemic Stroke. PLOS ONE,
8(10):e75615, August 2013.
[10] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical
Image Segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi,
editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in
Computer Science, pages 234–241, Cham, 2015. Springer International Publishing.
[11] Micha l Klimont, Agnieszka Oronowicz-Ja´skowiak, Mateusz Flieger, Jacek Rzeszutek, Robert Juszkat, and Katarzyna Jo´nczyk-Potoczna. Deep learning for cerebral angiography segmentation from non-contrast computed tomography. PLOS ONE, page 15, July 2020.
[12] Ozg¨un C¸ i¸cek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 3D U-Net: ¨Learning Dense Volumetric Segmentation from Sparse Annotation. In Sebastien Ourselin, Leo Joskowicz, Mert R. Sabuncu, Gozde Unal, and William Wells, editors, Medical Image Computing and ComputerAssisted Intervention – MICCAI 2016, Lecture Notes in Computer Science, pages 424–432, Cham, 2016. Springer International Publishing.
[13] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron
Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–
144, October 2020.
[14] Florian Thamm, Oliver Taubmann, Felix Denzinger, Markus J¨urgens, Hendrik Ditt, and Andreas Maier.
SyNCCT: Synthetic Non-Contrast Images of the Brain from Single-Energy Computed Tomography Angiography. volume 12907 of Lecture Notes in Computer Science. Springer, Cham, September 2021.