AI
RAG
Retrieval-Augmented Generation
Overview
Retrieval-Augmented Generation (RAG) combines a language model with a retrieval system to ground outputs in specific, up-to-date documents. Instead of relying solely on parametric knowledge baked into model weights, RAG dynamically fetches relevant passages from a vector store or search index and passes them as context to the LLM.
Key Concepts
- Query encoding: embeds the user question into a vector
- Retrieval: finds semantically similar documents from a vector database
- Context injection: appends retrieved passages to the LLM prompt
- Generation: the LLM answers grounded in retrieved evidence
- Optional: post-retrieval reranking for precision improvement
Key Facts
- RAG was introduced by Lewis et al. at Facebook AI Research in 2020
- It reduces hallucinations by providing verifiable source material
- Hybrid RAG combines dense retrieval with keyword search (BM25)
- RAG is widely used in enterprise knowledge bases, legal research, and medical AI