Multi-task Learning for Historical Handwritten Document Classification

Type: BA thesis

Status: finished

Date: May 7, 2020 - October 7, 2020

Supervisors: Vincent Christlein, Tino Haderlein, Andreas Maier

In the Competition on Image Retrieval for Historical Handwritten Documents 2019 [1], several methods
have been proposed to identify the writer of a document. Although most of the proposed methods
are based on feature descriptors and traditional machine learning techniques, the deep learning based
methods are emerging in the field of historical document classification. At the same time, other deep
learning based methods are dominating the image retrieval and classification task.
Multi-task learning (MTL) is an approach to learning multiple tasks at the same time using one neural
network. This approach has not been used for historical handwritten document classification. Over
the years, MTL has been applied to many fields, not only for computer vision, but also to speech
processing, bioinformatics, etc. , to boost the performance [2]. In context of computer vision, MTL is
used to detect facial landmarks to improve the performance of expression recognition [3]. Furthermore,
a convolutional neuronal network has been proposed for pose estimation with some auxiliary tasks: for
example body part detection [4].
In this work, we will investigate the approach with neural networks using multi-task learning for
historical handwritten document classification. We will use the online published datasets from previous
competitions for training and testing.
We will implement two multi-task neural networks, one should focus on writer identification with the
auxiliary task: binarization. The performance of this multi-task learning algorithm will be evaluated
using the datasets from the ICDAR2017 Competition on Historical Document Writer Identification
(Historical-WI) [5]. The other neural network should focus on dating and style classification and
the performance of this multi-task learning algorithm will be evaluated using the datasets from the
ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script [6].
The thesis consists of the following milestones:
• Review of the related work and methods of historical handwritten document classification
• Implemention of two neural networks for multi-task learning for:
– writer identification and binarization
– date classification and script type classification
• Evaluation the results of these two multi-task neural networks
• Comparison of this approach to current document classification approaches
• Examination and discussion about whether a multi-task neural network is useful for document
classification.
The implementation should be done in Pytorch.

[1] Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, and Andreas Maier. Icdar
2019 competition on image retrieval for historical handwritten documents. In International Conference on
Document Analysis and Recognition (ICDAR), 2019.
[2] Yu Zhang and Qiang Yang. A survey on multi-task learning. arXiv preprint arXiv:1707.08114, 2017.
[3] Terrance Devries, Kumar Biswaranjan, and Graham W. Taylor. Multi-task learning of facial landmarks and
expression. In 2014 Canadian Conference on Computer and Robot Vision, pages 98–103. IEEE, 2014.
[4] Sijin Li, Zhi-Qiang Liu, and Antoni B. Chan. Heterogeneous multi-task learning for human pose estimation
with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pages 482–489, 2014.
[5] Stefan Fiel, Florian Kleber, Markus Diem, Vincent Christlein, Georgios Louloudis, Stamatopoulos Nikos,
and Basilis Gatos. Icdar2017 competition on historical document writer identification (historical-wi). In
2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1,
pages 1377–1382. IEEE, 2017.
[6] Florence Cloppet, Veronique Eglin, Marlene Helias-Baron, Cuong Kieu, Nicole Vincent, and Dominique
Stutzmann. Icdar2017 competition on the classification of medieval handwritings in latin script. In 2017
14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1, pages
1371–1376. IEEE, 2017.