ICDAR2023 Competition on Detection and Recognition of Greek Letters on Papyri

Symbolic picture for the article. The link opens the image in a large view.
Homer, Ilias(C) Staatliche Museen zu Berlin, Ägyptisches Museum und Papyrussammlung

This competition investigates the performance of glyph detection and recognition on a very challenging type of historical document: Greek papyri. The detection and recognition of Greek letters on papyri is a preliminary step for computational analysis of handwriting that can lead to major steps forward in our understanding of this major source of information on Antiquity. It can be done manually by trained papyrologists. It is however a time-consuming task that would need automatising. We provide two different tasks: localization and classification or classification only.The document images are provided by several institutions and are representative of the diversity of book hands on papyri (a millennium time span, various script styles, provenance, states of preservation, means of digitization and resolution).

Homer, Ilias 1, 54–64, 71–104, 114–123, 131–164, 412–433, 456–465, 494–534, 537–590, 602–609. 1. – 2. Jh. n.Chr.

Tasks

  • Glyph (character) localization
  • Character classification

Timeline

  • April 1st, 0h01 GMT+1: publication of the test data
    • We will use CodaLab for the evaluation and leaderboard
    • Five submissions per day are allowed
    • Results dating from the deadline will be used for the ranking announced at ICDAR
  • April 9th, 15h00 GMT+1:
    • submission of the results
    • submission of a brief (~1 paragraph) description of the method

Results should be submitted either in the same format as the training ground truth, or in a CSV format along with a text file describing each column.

Competition results will be announced at the conference.

Data

The training data can be downloaded at the following address:

https://faubox.rrze.uni-erlangen.de/getlink/fi9GsXoxUHagRAjApzgk7ywo/HomerCompTraining.zip

The data is composed of images, and JSON files in the COCO format. The evaluation will be done using the average precision (AP) defined by COCO for the localization, and the percentage of correct answers for the classification.

Baseline

You can implement your system from scratch, or use this baseline for an easier initial development phase:

https://faubox.rrze.uni-erlangen.de/getlink/fi6WyMcdhECDQQDQsh4WUT/baseline.zip

We used the PyTorch tutorial on object detection, and made the minimum modifications to have this running on our data. The main modification is that instead of downscaling input images to process them as a whole, which makes the letters too small to be detected, the input is processed patch-wise without overlapping.

The baseline also contains evaluation code, and a trained model.

Moreover, for information purpose only, we did a random split of the training set and validated the provided model on it – you can find this subset and the predictions in:

https://faubox.rrze.uni-erlangen.de/getlink/fiJY4iYrYrtdpTzjBWL11s/baseline_validation.zip

Registration

We would like people intending to participate to register for organizational purpose. Should any modification to the training data be done, we will inform the participants.

To register, either send an e-mail to mathias.seuret AT fau DOT de, or fill the following form:

https://forms.gle/4Xmk9qZRd5qRs8zM9

Organizers

  • Isabelle Marthot-Santaniello
  • Stephen White
  • Olga Serbaeva Saraogi
  • Dalia Rodriguez-Salas
  • Guillaume Carrière
  • Vincent Christlein
  • Mathias Seuret

Prices for EELISA members (not incl. FAU)

The best submissions from EELISA European University partners will be awarded with the possibility of a one-week lab visit at the pattern recognition lab at FAU in Erlangen. For this lab visit, EELISA FAU will cover the costs for flight/train and hotel incl. breakfast.