Robust and Scalable Deep Learning for Historical Document Analysis in Adverse Archival Conditions

Date

Authors

Rasyidi, Hanif

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Historical Document Analysis and Recognition (HisDAR) remains technically challenging due to the degraded condition of source materials and the scarcity of annotated data. Although handcrafted methods perform well on specific datasets, they lack scalability and generalisation. This thesis demonstrates that deep learning methods, when adapted to address the unique characteristics of historical documents, can achieve robust performance across multiple HisDAR tasks while reducing reliance on handcrafted, task-specific solutions. Binarization is treated as a semantic segmentation task, recovering meaningful text while suppressing background degradation. We proposed an atrous convolutional encoder-decoder model that captured detailed features without excessive downsampling. This design, enhanced by multi-scale decoding, produced binary masks that preserve handwriting structure. To improve generalisation, a style augmentation module introduced visual variability using transformations inspired by style transfer, while a custom Pseudo-F loss enhanced structural fidelity. Evaluations on nine DIBCO competition datasets confirmed the model's robustness across domains, achieving more than 10 per cent improvement in Pseudo-F Measure compared to standard encoder-decoder architectures. We then examined historical writer identification (HWI), particularly in zero-shot scenarios where test writers are absent from training data. We assessed pre-processing techniques, including binarization and text area of interest (Text-AOI) selection, backbone networks (pre-trained CNN and SwinV2 with LoRA fine-tuning), and postprocessing methods such as patch sampling and feature pooling. Our results highlighted the value of consistent patch representations and discriminative features learned through ArcFace loss. A simplified end-to-end pipeline was proposed that achieved 97.40% Top-1 accuracy on ICDAR2013-WI, surpassing the handcrafted competition winner (95.60%), and 97.15% Top-1 accuracy on zero-shot historical writer identification (ICDAR2017), compared to 76.40% for handcrafted approaches. To overcome the scarcity of annotated historical data, we developed HistoriCraft, a lightweight document generation pipeline that converts modern datasets into historicalstyle outputs. This framework combines handwriting generation and visual style adaptation using a hybrid of encoder-decoder CNN and vision transformer. The model outperformed existing style transfer methods on perceptual quality metrics (SSIM: 0.8172 vs 0.5355 for neural style transfer) while preserving original annotations. The generated samples successfully augmented training data for segmentation and classification tasks in low-resource domains. This thesis presents a unified deep learning approach to historical document analysis across segmentation, classification, and generation. The key insight is that successful application of modern neural networks to historical documents requires more than simply adopting the latest architecture. It demands careful design choices that account for degraded image quality, limited annotated data, and the need for cross-domain generalisation. By addressing these limitations through targeted adaptations, this work demonstrates that adapted deep learning provides a practical and scalable alternative to handcrafted methods for digital heritage preservation.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads