Meaning as vectors
The keyword retriever you just built has one fatal blind spot: it matches the words you typed, nothing more. Ask it about a "canine" and the chunk about "the dog" scores zero — same idea, zero shared strings. Real users paraphrase constantly, so a string-matcher misses the right chunk and the agent answers from a worse one (or refuses when it shouldn't). Semantic search closes that gap by matching meaning.
Meaning lives in an embedding: a model converts a chunk into a vector — a
direction in space — where things that mean the same thing point nearly the same
way, regardless of which words they use. "The dog ran" and "the canine sprinted"
land close together; "fish swim deep" points elsewhere. To rank, you measure the
cosine similarity between the query's vector and each chunk's vector: the cosine
of the angle between them, 1 when they point identically and shrinking toward 0
as they diverge. Crucially, cosine looks at direction, not length — a long chunk
and a short one about the same topic still score high, because dividing by each
vector's magnitude normalizes the size away.
Work the example. The query points roughly [0.8, 0.3, 0.0]. Chunk c1 at
[0.9, 0.1, 0.0] points almost the same way → cosine ≈ 0.97, the top hit. c4
at [0.6, 0.5, 0.1] is close but tilts → ≈ 0.94. But c2 at [0.2, 0.9, 0.1]
leans into the second axis, nearly perpendicular to the query — a low score even
though it sits second in storage order. Ranking by similarity, not by where a
chunk happened to be stored, is the entire point.
Why it matters: this is what production vector databases do at scale — embed every chunk once, then for each query embed it and find the nearest directions. It catches the paraphrases keyword search drops, at the cost of a model call to build the vectors. No API here — the vectors are precomputed so you can focus on the ranking.
Fill in cosine(a, b) as dot(a, b) / (magnitude(a) * magnitude(b)), score every
chunk against the query, sort highest-first, and print the top 2 as <id>: <score>
rounded to two decimals. Done means c1: 0.97 then c4: 0.94 — and c2 nowhere in
sight.
A retriever doesn't fetch the chunk you stored first — it fetches the chunk whose meaning points most nearly toward your question.