Re-rank the Results

A sharper second pass

Your keyword retriever from earlier is fast and honest, but blunt: it scores a chunk by how many query words it contains, regardless of whether those words land together. Ask it about the return policy and the top candidate it hands back, c1, is a chunk that happens to sprinkle return, policy, and refund across a sentence about shipping — five scattered word-hits, keywordScore 5. Meanwhile c2, the chunk that literally opens "Our return policy..." — the exact thing the user asked for — sits at keywordScore 4, a rank below. First-pass retrieval optimized for recall: it pulled in everything plausibly relevant. It did not optimize for precision: putting the single best chunk on top.

That's the job of a re-ranker: a cheap second pass that re-scores the candidates the first pass already found, using a signal too expensive or too sharp to apply to the whole corpus. You never re-search — you only reorder the short list you already have. Here the sharper signal is the exact phrase: a chunk that contains the literal string return policy is almost certainly more on-topic than one that merely scatters those words, so you give it a bonus. Combined score = keywordScore + (contains exact phrase ? 2 : 0). Re-sort by that, and the ranking changes.

Walk it. c1 has keywordScore 5 but no exact phrase → stays 5. c2 has keywordScore 4 and contains "return policy" → 4 + 2 = 6. c3 has keywordScore 3 and also contains the phrase → 3 + 2 = 5. c4 is about store hours → 1. Sort descending (ties keep original order) and the top three flip to c2 (6), c1 (5), c3 (5). The chunk a human would have picked is now rank 1 — and notice c1 didn't vanish, it just got out-precised by the chunk that says the exact thing.

Why it matters: this two-stage shape — wide cheap recall, then a sharp cheap re-rank — is how real retrieval systems earn both coverage and a good top result without running the expensive signal over millions of documents. The first pass decides what's in the running; the re-ranker decides what wins.

Below you get four candidates with their first-pass keywordScore and the query return policy. The print loop is wired; the re-rank is yours. Re-score each candidate as keywordScore + 2 when its text contains the exact phrase, sort by that combined score (highest first, ties keep original order), and print the top three as rank <n>: <id> (score <N>). Done means c2 sits at rank 1 — the order changed from raw keywordScore alone.

Recall decides what makes the short list; a re-ranker decides what tops it. The cheapest precision win is often a second pass over candidates you already have.

A sharper second pass

This is 1 of 50 lessons.