What is RAG?

Retrieval-augmented generation (RAG) is a technique where a language model fetches relevant passages from your own data at query time and uses them to ground its answer, so responses cite real, current sources instead of relying only on what the model memorised in training.

Read in RomanianRAG

Why it matters

If you want an AI assistant that answers from your product docs, policies, or support history, and stays current as those change, RAG is usually the right starting point. It keeps answers grounded in sources you control, which is what makes the output trustworthy enough to put in front of customers.

How it works, briefly

A retrieval step searches your indexed content for the passages most relevant to the question, then passes those passages to the LLM alongside the prompt. The model answers using that supplied context, so it can reference specifics it was never trained on.

Where teams get it wrong

The model is rarely the bottleneck, retrieval quality is. Most disappointing systems fail at chunking, embedding, or ranking, not at generation. Fix retrieval before reaching for a bigger model or fine-tuning.

When we reach for it

For most "chatbot over our knowledge base" briefs, we start with RAG and only consider fine-tuning once retrieval is genuinely solid and a gap remains.

Related terms

Used in

RAG vs fine-tuning: which one does your AI feature need?Teams reach for fine-tuning when they usually mean grounding. Here is the honest difference, and why most products want retrieval first.

← Back to glossary