Day 8 · Retriever looks fine, answers still collapse 〔No.6 Logic Collapse〕

PSBigBig · Friday at 1:49 AM

tl;dr
Your retriever is correct, but the answer still hallucinates. This is No.6 Logic Collapse. The fix is not more vector tuning. It is a guardrail at the synthesis step: citation-first, explicit bridge on failure, and per-claim contracts. No infra changes needed. Think semantic firewall.

What most of us assume vs what actually happens

What you probably assume
if top-k contains the right spans, the model will stay inside the evidence. set temperature to 0, maybe raise k, job done.

What field traces show
answers drift even when retrieval is perfect. the chain merges adjacent ideas, softens on safety, and “fills in” missing pieces. the JSON half looks clean, while the prose half wanders. paraphrase your question and the conclusion flips although the same snippets were retrieved.

Reality
this is a synthesis-stage failure, not a retriever failure. you need a recovery path, not just “better chunks.”

ProblemMap anchor

No.6 Logic Collapse & Recovery The chain must expose a formal collapse → bridge → rebirth path, and it must put citations before prose.

Symptoms that give No.6 away

top-k has the right section, yet the answer includes claims that never appear in those spans
citations show up only at the end, or once, or they point to the wrong section
your prompt is identical but a small paraphrase flips the result while snippets are unchanged
function calling JSON is correct, natural language answer is off-contract
the chain “repairs” missing evidence instead of pausing to ask for it

SEO phrases people search for that map to this:

retriever correct answer wrong
RAG hallucination with correct chunks
OpenAI function calling JSON drift fix
GPT citation template for RAG
why seed does not make GPT deterministic

Reproduce in 60 seconds 〔no code, no infra change〕

run two versions of your prompt on the same retrieved snippets a. freeform answer b. citation-first answer that must output citations before any prose
measure ΔS(question, retrieved) for both variants. count citations per atomic claim
paraphrase the question 3 times. watch for output alternation even when snippets stay the same
if freeform fails and citation-first passes, you are in No.6 territory

Rules of thumb

ΔS(question, retrieved) ≤ 0.45 is a healthy band
at least one in-scope citation per atomic claim
no prose without prior valid citations

Diagnosis tree

citations placed after explanation → move to citation-first
evidence order mismatches reasoning order → require per-claim structure
answers mix sections → scope citations to the current top-k only
safety softens content → require cite-then-explain so safety applies to claims with support
reranker prefers summaries → switch to claim-aligned spans, not generic “best paragraph”

Minimal fix 〔stop freewheeling, force recovery〕

1. Citation-first gate

Code:

import re

def extract_cites(txt):
    # accepts [id], (id), or {"id":"..."}
    rx = re.compile(r"(?:\[(\w+)\]|\((\w+)\)|\"id\"\s*:\s*\"([^\"]+)\")")
    ids = [g for t in rx.findall(txt) for g in t if g]
    return list(dict.fromkeys(ids))

def citation_gate(output_text, allowed_ids):
    ids = extract_cites(output_text)
    ok = bool(ids) and all(i in allowed_ids for i in ids)
    return {"ok": ok, "cites": ids}

2. Per-claim contract

Code:

def validate_contract(payload, allowed_ids):
    # payload = [{"claim":"...", "citations":["s17","s33"]}, ...]
    bad = []
    for i, c in enumerate(payload):
        cites = c.get("citations", [])
        if not cites or not set(cites) <= set(allowed_ids):
            bad.append(i)
    return {"ok": not bad, "bad_claim_indexes": bad}

3. Bridge on collapse
when citations are missing or out of scope, stop, restate the last valid state, and request the next required snippet id before any prose.

Code:

def bridge_plan(question, need="snippet_id"):
    return {
      "state": "bridge",
      "todo": f"missing {need}. request it explicitly.",
      "why":  "logic collapse detected without valid citations"
    }

4. Fail-fast template
output order must be: citations → minimal JSON plan → short prose. reject outputs that skip steps.

Hard fixes when minimal is not enough

trim headers that bias toward summary and storytelling
shorten evidence windows and rotate snippets rather than stacking many
split multi-topic questions, answer in separate passes
add a reranker tuned for claim-aligned spans
compress the output into a per-claim JSON that requires snippet ids

Guardrails to turn on 〔semantic firewall, no infra change〕

Trace schema for No.6. log claim-level citations and ΔS at each hop
Bridge step as a formal state. collapse → bridge → rebirth
Variance clamp. track λ across three paraphrases and reject divergent chains
Cite-then-explain contract. refuse prose until citations are in scope

Acceptance checks

ΔS(question, retrieved) ≤ 0.45 across 3 paraphrases
every atomic claim has at least one in-scope citation id
λ stays convergent across seeds and sessions
when evidence is thin, the chain bridges rather than guessing

FAQ 〔the long tail people ask all the time〕

does temperature 0 fix this
no. you are clamping one sample from a distribution. collapse is structural.

can seed make outputs deterministic
not reliably across paraphrases or tool timing. treat outputs as distributions and stabilize with ΔS and λ checks.

why does citation-first help JSON mode drift
because it scopes claims before prose. the model can still write, but only inside the cited window.

is this just prompt engineering
no. you are enforcing a contract and a recovery path. prompt text is the interface, the behavior is structural.

how do i know it worked
paraphrase the question three times, keep the same snippets. if ΔS stays low and answers align, your chain is stable.

Copy-paste triage prompt

Code:

I uploaded TXT OS and the Problem Map files.

My bug:
- symptom: [brief]
- traces: [ΔS(question,retrieved)=..., λ states, citations per claim, tool logs if any]

Tell me:
1) which layer is failing and why,
2) which exact fix page to open,
3) the minimal steps to push ΔS ≤ 0.45 and keep λ convergent,
4) how to verify with a reproducible test.

Use the collapse → bridge → rebirth pattern if needed.

Closing

retriever can be right while synthesis collapses. move to citation-first, add a real bridge, and make per-claim contracts enforceable. once you do this, your chain stops freewheeling and starts behaving like a system you can test.

Full write-up and the growing set of daily notes here:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/article/README.md

Continue reading...

Day 8 · Retriever looks fine, answers still collapse 〔No.6 Logic Collapse〕

PSBigBig

Guest

What most of us assume vs what actually happens​

ProblemMap anchor​

Symptoms that give No.6 away​

Reproduce in 60 seconds 〔no code, no infra change〕​

Diagnosis tree​

Minimal fix 〔stop freewheeling, force recovery〕​

Hard fixes when minimal is not enough​

Guardrails to turn on 〔semantic firewall, no infra change〕​

Acceptance checks​

FAQ 〔the long tail people ask all the time〕​

Copy-paste triage prompt​

Closing​

What most of us assume vs what actually happens

ProblemMap anchor

Symptoms that give No.6 away

Reproduce in 60 seconds 〔no code, no infra change〕

Diagnosis tree

Minimal fix 〔stop freewheeling, force recovery〕

Hard fixes when minimal is not enough

Guardrails to turn on 〔semantic firewall, no infra change〕

Acceptance checks

FAQ 〔the long tail people ask all the time〕

Copy-paste triage prompt

Closing