The aim of this thesis is to present a fairer and more practical comparison between traditional and deep learning-based binarization methods. Models chosen from both selections will be applied to historical document datasets and evaluated against the ground truth, focusings on their practical application, measuring their impacts on OCR performance and binarization quality, and developing a transparent and usable framework for the evaluation and comparison of binarization methods. This will make it possible to compare each method’s results directly in terms of text recognition quality.
Experimental Setup and Resources
Traditional methods:
Otsu and Sauvola.
Deep Learning-Based Models:
- SAE (Selectional Autoencoders): to evaluate how well a compact encoder-decoder model learns pixel-level binarization for historical documents DeepOtsu: A U-Net-based enhancement model followed by Otsu thresholding for final binarization.
- ROBIN (U-Net Variant): A representative of U-Net-based segmentation models to assess their performance in direct binary mask prediction.
Datasets
- DIBCO (2009–2022): Benchmark dataset comprising printed and handwritten degraded documents
- HisDB: Historical manuscript datasets with realistic degradation patterns. Technologies and Tools
Evaluation Metrics:
- F-measure,
- PSNR,
- DRD,
- NRM,
- and OCR accuracy for practical evaluation.
Milestones
1. Literature Review: Study existing binarization techniques mentioned in the models
sections, particularly focusing on their application to historical documents.
2. Dataset Preparation: Collect and preprocess publicly available datasets. Align ground
truth masks and normalize formats for consistent evaluation
3. Model Implementation and Integration: Implement or modify binarization models
using PyTorch; create a single pipeline that includes both deep learning and traditional
models.
4. Evaluation and Comparison: Compare results with common metrics (F-measure,
PSNR, DRD, NRM) and visually compare outputs to quantitative results. The binarized
images can then be fed into an OCR system to evaluate the performance of the OCR
5. Analysis and Interpretation Discuss the strengths and limitations of each technique for
different types of degradation.
6. Documentation and Reporting: Compile all documentation (results and analysis) into a
report.