RAG-Enhanced Low-Cost Vision-Language Models for Diabetic Retinopathy Classification and Automated Reporting

Diabetic Retinopathy (DR) affects over 160 million people globally, projected to reach 180 million by 2030, despite 90% of related blindness being preventable through early detection [1]. Current AI models achieve strong classification performance but lack interpretable clinical reports, limiting their adoption in low-resource settings. Although Vision-Language Models (VLMs) offer unified diagnosis and report generation, fundus captioning significantly underperforms compared with other imaging modalities [2,3], and state-of-the-art VLMs remain computationally expensive. Although Retrieval-Augmented Generation (RAG) has improved medical imaging accuracy [4], no prior study has integrated DR severity grading, lesion-aware reporting, and evidence retrieval within a low-cost, clinically deployable VLM.