Dynamic Query Routing: Adaptive RAG Systems Leveraging Hallucination Risk and Specialization Affordance

Large Language Models responses are often generic and may contain false information leading to hallucinations. It takes a lot of resources to train LLM with updated data. As such, Individual organizations can use a Retrieval Augmented Generation (RAG) system with LLM to avoid those drawbacks and make it their own. It uses a private knowledge base to pick up necessary information relevant to the user’s query and feed it to LLM to generate detailed and context rich responses. However, RAG system may give rise to conflict between LLM’s prior learned memory and in-context memory due to new resources.

RAG can be optimized based on how the external knowledge base is utilized. Focus on key extraction from query, structured knowledge base, iteration/ recursion in searching relevant information and re-ranking mechanisms based on knowledge density/diversity are some of the strategies that can be used. Among them, there is a strategy that analyzes the user’s query if it is a simple one that requires smaller context or can be answered without external knowledge base, or complex one that requires more context to get accurate answers. This leads to dynamically adapting the retrieval process according to the user’s need and adjusting the context window. It can also refine its performance based on user interaction and feedbacks by prioritizing resources that have previously provided accurate and relevant answers.

This thesis will focus on development of similar approach and work on comparing them with previous similar works and with conventional RAG methods and improve performance where possible. Recent updates of LLMs such as GPT series, BERT or T5 models, which provide strong foundational capabilities for text generation and can be used for fine-tuning key extraction tasks, may be used depending on available computational resources. The research will use a digital copy of one of the course books used in Masters in AI program, as the primary source for external knowledge base, and use query datasets/evaluation datasets (a mix of straightforward queries and conceptual questions) based on the book. Recent similar works include the following:

Jeong, S., Baek, J., & Cho, S. (2024). Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity. https://arxiv.org/pdf/2403.14403v2
Wang, X., Sen, P., & Li, R. (2024). Adaptive Retrieval-Augmented Generation for Conversational Systems. https://arxiv.org/pdf/2407.21712