• Jump to content
  • Jump to navigation
  • Jump to bottom of page
Simulate organization breadcrumb open Simulate organization breadcrumb close
Pattern Recognition Lab
  • FAUTo the central FAU website
  • Campo
  • UnivIS
  • Jobs
  • Map
  • Help

Pattern Recognition Lab

Navigation Navigation close
  • Overview
    • Contact
    • Directions
    Portal Overview
  • Team
    • Former PRL members
    Portal Team
  • Research
    • Research Groups
    • Research Projects
    • Pattern Recognition Blog
    • Beyond the Patterns
    • Publications
    • Research Demo Videos
    • Datasets
    • Competitions
    Portal Research
  • Teaching
    • Curriculum / Courses
    • Lecture Notes
    • Lecture Videos
    • Thesis / Projects
    • Free Machine and Deep Learning Resources
    • Free Medical Engineering Resources
    • LME Videos
    Portal Teaching
  • Lab
    • News
    • Ph.D. Gallery
    • Cooperations
    • Join the Pattern Recognition Lab
    Portal Lab
  1. Home
  2. Research
  3. Research Groups
  4. Computer Vision
  5. Font Group Recognition for Improved OCR

Font Group Recognition for Improved OCR

In page navigation: Research
  • Beyond the Patterns
  • Competitions
  • Publications
  • Datasets

Font Group Recognition for Improved OCR

Font Group Recognition for Improved OCR

(Third Party Funds Single)

Overall project:
Project leader:
Project members:
Start date: August 1, 2021
End date: August 1, 2023
Acronym:
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
URL:

Abstract

Although OCR-D made huge progress in the last project phase in providing OCR for early printed books, it still faces two major problems: The huge variety of the material makes it extremely challenging to use generic OCR-models. Yet, selecting specific models is not possible as the sheer amount of material prevents a fully automatic workflow. This situation is further complicated by the lack of appropriate OCR training data. Current data sets consist overwhelmingly of texts in Fraktur, especially from the 19th century. This completely neglects the large typographic variety displayed by printing in the three previous centuries. Therefore, and in response to the demand from SLUB Dresden and ULB Halle, we propose to improve the current situation significantly1) fine tuning our font group recognition system to such a degree that it can be used at character level;2) transcribing more specific OCR training data for the 16th-18th century, which includes popular fonts such as Schwabacher, other bastards and old Fraktur styles; 3) training font-specific OCR models as well as integrated models that recognise both typeface and text simultaneously. This approach has ensured in other contexts that the network performs better on both individual tasks, as we can thus reduce overfitting during training. This project will improve OCR quality significantly, especially for books in non-Fraktur fonts. It will also provide a training data set of very high quality that can be reused in long term. Finally, the project will provide a more fine-grained font recognition tool that, beyond enabling font-specific OCR, also has important applications in text attribute recognition and layout analysis.

Publications

Friedrich-Alexander-Universität
Erlangen-Nürnberg

Schlossplatz 4
91054 Erlangen
  • Login
  • Intranet
  • Imprint
  • Privacy
  • Accessibility
Up