Understanding RAG: Connecting frozen LLMs to external dynamic knowledge.
Large Language Models (LLMs) operate much like a knowledgeable friend trying to answer your questions. However, even a smart friend can make mistakes:
This happens because their knowledge is limited to what they learned in the past. If I want to give you a correct answer, I should search the web references rather than relying solely on my memory.
Same for LLMs! The training data used to train the model is static—it’s essentially “last year’s data.” To fix this, we need to give the LLM access to external tools.
To provide the LLM with up-to-date information, we connect it to a Vector Database. A vector database stores various types of documents (PDFs, text files, APIs) as vector embeddings—numerical representations of the semantic meaning of the text.
Before answering a user’s question, the system searches this database for verifiable facts. If relevant information exists, it is retrieved and fed to the LLM.
The key difference between a standard LLM interaction and RAG is the intermediate search step.
The user asks a question, and the model answers directly from its frozen training weights.
User Prompt $\rightarrow$ Answer from Parametric Memory
The system instructs the model to look at the database and combine that information with the user’s prompt.
Result: New Information + Known Source.
The primary advantage is efficiency and cost.
As new information arises (e.g., new product manuals, daily news), you do not have to retrain the LLM, which is computationally expensive and slow. You only need to update the Vector Database, which is cheap and instant.