ICDAR2023 Competition on Detection and Recognition of Greek Letters on Papyri

Symbolic picture for the article. The link opens the image in a large view.
Homer, Ilias(C) Staatliche Museen zu Berlin, Ägyptisches Museum und Papyrussammlung

This competition investigates the performance of glyph detection and recognition on a very challenging type of historical document: Greek papyri. The detection and recognition of Greek letters on papyri is a preliminary step for computational analysis of handwriting that can lead to major steps forward in our understanding of this major source of information on Antiquity. It can be done manually by trained papyrologists. It is however a time-consuming task that would need automatising. We provide two different tasks: localization and classification or classification only.The document images are provided by several institutions and are representative of the diversity of book hands on papyri (a millennium time span, various script styles, provenance, states of preservation, means of digitization and resolution).

Competition Report

The competition results have been presented at ICDAR 2023, and are available in the following paper:

https://link.springer.com/chapter/10.1007/978-3-031-41679-8_29

Tasks

  • Glyph (character) localization
  • Character classification

Timeline

The competition is over, but the data is of course still available (see the Data section below), and submissions on Codalab are open for easy comparison purpose. It had the followng timeline:

  • April 1st, 0h01 GMT+1: publication of the test data
    • We will use CodaLab for the evaluation and leaderboard
    • Five submissions per day are allowed
    • Results dating from the deadline will be used for the ranking announced at ICDAR
  • April 16th, 15h00 GMT+1:
    • submission of the results
    • submission of a brief (~1 paragraph) description of the method

Results should be submitted either in the same format as the example given in the “Baseline” section.

Competition results will be announced at the conference.

Data

The dataset (training and test subsets) is available at the following address: https://zenodo.org/records/13825619

Important: for the evaluation to work on Codalab, the COCO format has to be used.

Here is an example of an annotation which follows this format:

    {
        "image_id": 1,
        "category_id": 119,
        "bbox": [
            339,
            461,
            114,
            102
        ],
        "score": 0.9871184825897217
    }

Baseline

You can implement your system from scratch, or use this baseline for an easier initial development phase:

https://faubox.rrze.uni-erlangen.de/getlink/fi6WyMcdhECDQQDQsh4WUT/baseline.zip

We used the PyTorch tutorial on object detection, and made the minimum modifications to have this running on our data. The main modification is that instead of downscaling input images to process them as a whole, which makes the letters too small to be detected, the input is processed patch-wise without overlapping.

The baseline also contains evaluation code, and a trained model.

Moreover, for information purpose only, we did a random split of the training set and validated the provided model on it – you can find this subset and the predictions in:

https://faubox.rrze.uni-erlangen.de/getlink/fiJY4iYrYrtdpTzjBWL11s/baseline_validation.zip

Registration

As the competition is over, the registration is closed. Submissions on Codalab, for comparison purpose, are freely open.

Organizers

  • Isabelle Marthot-Santaniello
  • Stephen White
  • Olga Serbaeva Saraogi
  • Dalia Rodriguez-Salas
  • Guillaume Carrière
  • Vincent Christlein
  • Mathias Seuret

Prices for EELISA members (not incl. FAU)

The best submissions from EELISA European University partners will be awarded with the possibility of a one-week lab visit at the pattern recognition lab at FAU in Erlangen. For this lab visit, EELISA FAU will cover the costs for flight/train and hotel incl. breakfast.