Improving AFFGANwriting by Exploring Deep Learning Models for Style Encoders and Image Generation of Sentence-Level Handwriting

Handwriting generation is a fundamental task in computer vision and natural language processing, with applications in personalized content generation and so on. The AFFGANwriting model presents a generative framework for synthesizing word-level handwritten images by fusing multistyle features using a GAN-based approach with a VGG-style encoder. However, its scope is limited in two ways:
• It only generates individual word images
• It used a fixed VGG backbone which may not capture style semantics as effectively as more modern alternatives such as CNN and transformer models (e.g. EfficientNet, ResNet, DINO).

With an increasing demand for personalized handwriting synthesis across longer text spans, there’s a clear motivation to explore if advanced backbone models can improve the feature extraction of the style. In addition, there is need to extend the generative capacity from words to full sentences and to interact ideally in a user-friendly interactive system.

Research questions

• Can more recent feature extractors like CNN and transformers (EfficientNet, ResNet, DINO) outperform VGG in capturing style-relevant features for handwriting generation?
• What are the architectural or training modifications required to extend AFFGANwriting from word-level to sentence-level image synthesis?
• How can the model be integrated into an intuitive web application that allows users to select a writing style and input arbitrary text for sentence-level generation?

Goal

To enhance AFFGANwriting’s quality and flexibility in handwriting image generation by:
• Upgrading the style encoder
• Enabling sentence-level synthesis
• Deploying the system as a web app for user interaction