Retrieval-Augmented Generation (RAG)

Understanding RAG: Connecting frozen LLMs to external dynamic knowledge.

I. The Core Problem

Large Language Models (LLMs) operate much like a knowledgeable friend trying to answer your questions. However, even a smart friend can make mistakes:

  1. Stale Information: They might rely on outdated facts (e.g., “The Queen is the monarch of England”).
  2. Unreliability of the Source: We don’t know where this information came from.

This happens because their knowledge is limited to what they learned in the past. If I want to give you a correct answer, I should search the web references rather than relying solely on my memory.

Same for LLMs! The training data used to train the model is static—it’s essentially “last year’s data.” To fix this, we need to give the LLM access to external tools.


II. The Solution: Vector Database

To provide the LLM with up-to-date information, we connect it to a Vector Database. A vector database stores various types of documents (PDFs, text files, APIs) as vector embeddings—numerical representations of the semantic meaning of the text.

Before answering a user’s question, the system searches this database for verifiable facts. If relevant information exists, it is retrieved and fed to the LLM.


III. The Workflow

The key difference between a standard LLM interaction and RAG is the intermediate search step.

Without RAG

The user asks a question, and the model answers directly from its frozen training weights.

User Prompt $\rightarrow$ Answer from Parametric Memory

With RAG

The system instructs the model to look at the database and combine that information with the user’s prompt.

  1. Instruction: “Look at the database and combine the retrieved context with the user’s prompt to answer.”
  2. Search: The system embeds the user’s prompt and finds the closest matching vectors in the database.
  3. Augmentation: The retrieved context + the original prompt are sent to the LLM.
  4. Generation: The LLM generates an answer based on both its training and the new source.

Result: New Information + Known Source.

graph LR A[User Prompt] --> B(Search Vector DB); C[Documents] --> B; B --> D[Retrieved Context]; D --> E(LLM Input); A --> E; E --> F[Answer with Source];
RAG Workflow: Augmenting generation with retrieval.

IV. Why RAG?

The primary advantage is efficiency and cost.

As new information arises (e.g., new product manuals, daily news), you do not have to retrain the LLM, which is computationally expensive and slow. You only need to update the Vector Database, which is cheap and instant.