Start, follow, read, stop: Incorporating new steps into end-to-end full-page handwriting recognition method

Type: BA thesis

Status: running

Date: June 1, 2020 - November 1, 2020

Supervisors: Vincent Christlein, Christian Bergler, Andreas Maier

In this work, new steps are incorporated into a known offline recognition method [1] as an attempt to
improve the transcription of degraded and poor-quality historical documents. The previously proposed
model consists of three components:
1. Start-of-line (SOL)
This network predicts the starting points of lines, together with an indication of the size and
direction of the handwriting.
2. Line-follower (LF)
Given a starting point, the LF network follows the handwriting line in incremental steps and
outputs a dewarped line image that is suitable for text recognition purposes.
3. Handwriting recognition (HWR)
After having the LF network produce several normalized line images, these can then be fed to a
CNN-LSTM HWR network [2] to produce transcriptions of the detected lines.
The method performed well on warped lines and has the advantage of outputting polygonal regions
instead of bounding boxes [3], but it still has several shortcomings, specially when considering
documents where unrelated pieces of information are frequently horizontally adjacent to one another.
It cannot detect and adapt to changes in handwriting size either, relying solely on the initial prediction
made by the SOL network to extract lines.
Modifications are to be made to the network architecture of the model in order to address these
shortcomings, and the thesis would then consist of the following milestones:
• Extending the SOL network architecture in order to include End-of-Line (EOL) detection.
• Modifying the LF network architecture to capture variations in handwriting size.
• Applying the LF network backwards from EOL predictions and finding an effective way of
merging both line information.
• Evaluating performance on historical full page datasets.
• Further experiments regarding procedure and network architecture.

The implementation should be done in Python.

[1] Davis B. Barrett W. Price B. Cohen S. Wigington C., Tensmeyer C. Start, follow, read: End-to-end full-page
handwriting recognition. Computer Vision – European Conference on Computer Vision 2018 (ECCV) pages
372-388, 2018.
[2] Stewart S. Davis B. Barrett W. Price B. Cohen S. Wigington, C. Data augmentation for recognition of
handwritten words and lines using a cnn-lstm network. 14th International Conference on Document Analysis
and Recognition (ICDAR) pp. 639–645, 2017.
[3] Wolf C. Moysset B., Kermorvant C. Full-page text recognition: Learning where to start and when to stop.
14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017.