Retrieve, then ground
A bare model answers your refund question from whatever it absorbed in training — a number that sounds right, with nothing behind it. Ask it twice and you might get two different windows, and neither comes with a source you can check. That's the failure RAG is built to fix.
A RAG agent runs three moves in order. First it retrieves: it searches your own documents and pulls back the snippet most relevant to the question. Then it grounds: it writes the answer using the words of that snippet, not its own memory. Finally it cites: it tags the answer with the id of the doc it came from, so the claim is checkable. The knowledge isn't in the model — it's in the documents, and the model's job is to find the right one and stay faithful to it.
There's a fourth move that separates a trustworthy RAG agent from a confident one: when retrieval comes back empty — nothing in the corpus actually matches — the honest answer is "I don't know." A bare model fills the gap with a guess; a grounded agent refuses. Refusing on no-match is the whole reason grounding is worth more than fluency.
Watch this agent search the handbook, read back the matching snippet, and answer with a citation instead of a guess.
Now build that mechanism yourself. Below is a tiny handbook (one doc holds the
refund window) and the question "What is our refund window?". The starter cheats:
it answers 14 days from memory, with no retrieval and no citation. Implement
retrieve(query, corpus) so it scores each doc by keyword overlap and returns the
best match; print retrieved: <id> (score N); then ground the answer in that
snippet and cite its id — or refuse with "Not in the docs" if the best
score is 0.
Grounding is what makes an answer checkable; refusing-on-no-match is what makes it trustworthy. A RAG agent earns belief by showing its source — or admitting it has none.