Navigation

HisFragIR20 – ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments

This competition investigates the performance of large-scale retrieval of historical document fragments based on writer recognition. The analysis of historic fragments is a difficult challenge commonly solved by trained humanists. To simulate fragments, we extract random text patches from historical document images. The goal is then to find similar patches of the same page or manuscript. The document images are provided by several institutions and different genres (manuscripts, letters, charters).

Task

The task consists of finding all fragments corresponding to (a) the same page (b) the same writer using a document fragment as query.

Timeline

Feb 15 Homepage running, registration possible (please just drop V.Christlein a mail).
April 1 Providing official training set with 100k realistic fragments. Due to a hardware failure, we lost our fragments. We are in the process of recreating them and will share the dataset as soon as possible. Please excuse us if that causes any troubles.
See ‘Dataset’ below for currently available datasets, which you can use already for evaluating your method.
May 1 Providing evaluation test set (20k fragments) and evaluation method. A baseline system will also be provided.
May 15 Competition deadline.

Dataset

For now, you can train your system on random crops of the following datasets: 1. Historical-IR19, https://zenodo.org/record/3262372 and 2. Historical-WI, https://zenodo.org/record/1324999 and 3. other datasets below https://clamm.irht.cnrs.fr/

Evaluation

The evaluation will be done using a leave-one-image-out cross-validation approach. This means that every image of the test set will be used as query for which the other test images will have to be ranked.
The competition will be evaluated in two ways:
1. On a writer-level, i.e. the goal is to find fragments of the same writer.
2. On a page-level, i.e. finding fragments of the same page

Submission

  • Please send us your 20 000 x 20 000 CSV file w. the first row and first column denote the query and gallery file. The respective entries are the distances (the smaller the more similar).
  • Please also send us 1/2 – 1 page scientific description of the used method.