RAG

Retrieval-Augmented Generation

retrievalgroundingknowledge

Overview

Retrieval-Augmented Generation (RAG) combines a language model with a retrieval system to ground outputs in specific, up-to-date documents. Instead of relying solely on parametric knowledge baked into model weights, RAG dynamically fetches relevant passages from a vector store or search index and passes them as context to the LLM.

Key Concepts

Query encoding: embeds the user question into a vector
Retrieval: finds semantically similar documents from a vector database
Context injection: appends retrieved passages to the LLM prompt
Generation: the LLM answers grounded in retrieved evidence
Optional: post-retrieval reranking for precision improvement

Key Facts

RAG was introduced by Lewis et al. at Facebook AI Research in 2020
It reduces hallucinations by providing verifiable source material
Hybrid RAG combines dense retrieval with keyword search (BM25)
RAG is widely used in enterprise knowledge bases, legal research, and medical AI

RAG

Overview

Key Concepts

Key Facts

Related

Transformers

Prompt Engineering

Agentic Workflows

Embeddings

Large Language Models

Context Window