The goal of this thesis is to develop a high-accuracy deep learning model for detecting tampered text in document images. This includes manipulations such as word replacement, copy-paste edits, and layout-based alterations. The focus is on building a multimodal architecture that combines visual layout features
and semantic textual content to improve detection accuracy and robustness across diverse document types and manipulation styles.