• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
Friedrich-Alexander-Universität Pattern Recognition Lab PRL
  • FAUTo the central FAU website
Suche öffnen
  • Campo
  • StudOn
  • FAUdir
  • Jobs
  • Map
  • Help
Friedrich-Alexander-Universität Pattern Recognition Lab PRL
Navigation Navigation close
  • Lab
    • News
    • Cooperations
    • Join the Pattern Recognition Lab
    • Ph.D. Gallery
    • Contact
    • Directions
  • Team
    • Our Team
    • Former PRL members
  • Research
    • Research Groups
    • Research Projects
    • Publications
    • Competitions
    • Datasets
    • Research Demo Videos
    • Pattern Recognition Blog
    • Beyond the Patterns
  • Teaching
    • Curriculum / Courses
    • Lecture Notes
    • Lecture Videos
    • LME Videos
    • Thesis / Projects
  1. Home
  2. Research
  3. Research Groups
  4. Computer Vision
  5. Font Group Recognition for Improved OCR

Font Group Recognition for Improved OCR

In page navigation: Research
  • Beyond the Patterns
  • Competitions
  • Publications
  • Datasets
  • An AI-based framework for visualizing and analyzing massive amounts of 4D tomography data for beamline end users
  • An AI-based framework for visualizing and analyzing massive amounts of 4D tomography data for beamline end users
  • An AI-based framework for visualizing and analyzing massive amounts of 4D tomography data for beamline end users

Font Group Recognition for Improved OCR

Font Group Recognition for Improved OCR

(Third Party Funds Single)

Overall project:
Project leader: Vincent Christlein
Project members:
Start date: August 1, 2021
End date: August 1, 2023
Acronym:
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
URL:

Abstract

Although OCR-D made huge progress in the last project phase in providing OCR for early printed books, it still faces two major problems: The huge variety of the material makes it extremely challenging to use generic OCR-models. Yet, selecting specific models is not possible as the sheer amount of material prevents a fully automatic workflow. This situation is further complicated by the lack of appropriate OCR training data. Current data sets consist overwhelmingly of texts in Fraktur, especially from the 19th century. This completely neglects the large typographic variety displayed by printing in the three previous centuries. Therefore, and in response to the demand from SLUB Dresden and ULB Halle, we propose to improve the current situation significantly1) fine tuning our font group recognition system to such a degree that it can be used at character level;2) transcribing more specific OCR training data for the 16th-18th century, which includes popular fonts such as Schwabacher, other bastards and old Fraktur styles; 3) training font-specific OCR models as well as integrated models that recognise both typeface and text simultaneously. This approach has ensured in other contexts that the network performs better on both individual tasks, as we can thus reduce overfitting during training. This project will improve OCR quality significantly, especially for books in non-Fraktur fonts. It will also provide a training data set of very high quality that can be reused in long term. Finally, the project will provide a more fine-grained font recognition tool that, beyond enabling font-specific OCR, also has important applications in text attribute recognition and layout analysis.

Publications

    Friedrich-Alexander-Universität Erlangen-Nürnberg
    Lehrstuhl für Mustererkennung (Informatik 5)

    Martensstr. 3
    91058 Erlangen
    • Contact
    • Login
    • Intranet
    • Imprint
    • Privacy
    • Accessibility
    • RSS Feed
    • Instagram
    • TikTok
    • Mastodon
    • BlueSky
    • YouTube
    • Facebook
    • Xing
    • LinkedIn
    • Community
    • Threads
    Up