Day 13 – Multi-agent Chaos in AI Pipelines (ProblemMap No.13)

PSBigBig · 2025-09-03T05:12:29+0100

Symptom
when multiple AI agents query the same PDF or vector database at the same time, instead of collaboration you get semantic contamination. answers drift, citations don’t match, and retrieval coverage mutates depending on which agent touched the index first.

Common Failure Patterns in Multi-Agent Pipelines

Two agents ingest the same document concurrently → their traces overwrite each other.
Retrieval results differ depending on run order, even with identical queries.
Citations point to spans that only one agent saw, the other invents filler.
Embedding counts mismatch corpus size because each agent tokenized differently.
Logs show answers that change unpredictably across sessions, leading to “ghost context.”

These are classic multi-agent concurrency bugs in retrieval-augmented generation (RAG) systems.

ProblemMap Reference

No.13 Multi-agent chaos This failure mode happens when pipelines allow parallel agents on shared resources (vector stores, indexes, traces) without isolation. Instead of independent reasoning, they pollute each other’s context.

Quick 60-second Diagnostic

Isolation probe
Run two agents on the same PDF. If traces merge or overwrite, contamination confirmed.
Index collision
Let agents build embeddings in parallel. If token counts differ or coverage jumps, vectorstore not isolated.
Cross-contamination test
Ask Agent A about fact X, then Agent B about fact Y. If B’s answer contains A’s context, pipeline leaked.

Checklist for Diagnosis

Interleaved ingestion logs (no separation between agents)
Retrieval results fluctuate even when corpus is stable
Hallucinations correlate with concurrency, not corpus difficulty
Embedding stats mismatch expected document size
Trace logs lack per-agent identifiers

Minimal Fixes

The immediate goal is to enforce single-source trace and index isolation.

Separate traces per agent – each run must log independently.
Isolate index access – agents use read-only mode or build local caches.
Lock ingestion – no simultaneous writes on the same document.
Explicit agent IDs – tag all chunks with the originating agent.

Hard Fixes for Production

Multi-tenant vectorstore partitions (per agent / per task)
Ingestion validators to reject mixed-agent writes
Evaluation gates (coverage ≥ 0.7 before allowing merge)
A coordination/orchestration layer to serialize agent requests

These are necessary for scalable multi-agent frameworks where concurrency is unavoidable.

Guardrails from WFGY

Trace isolation – per-agent semantic tree logging
Index fences – embedding contracts per agent before merging
Retrieval playbook – enforce consistency across paraphrases before sharing results
Audit logs – intake → embedding → retrieval per agent, visible in traces

This shifts the failure from “silent contamination” to an observable, debuggable process.

Tiny Sanity Script

Code:

class Agent:
    def __init__(self, name):
        self.name = name
        self.trace = []

    def ingest(self, doc):
        self.trace.append(f"{self.name} saw {doc}")

A = Agent("A")
B = Agent("B")

A.ingest("PDF1")
B.ingest("PDF1")

print(A.trace)  # ['A saw PDF1']
print(B.trace)  # ['B saw PDF1']
# independent traces → no cross-contamination

Acceptance Checks

Each agent’s trace log reproducible and independent
Retrieval coverage stable across concurrent runs
No hallucinations tied to query order or concurrency
Merges only allowed after validation per agent

TL;DR

Multi-agent chaos happens when multiple agents share the same intake or index without proper isolation. Always enforce per-agent fences before merging. Otherwise, your RAG pipeline ends up with semantic contamination and unpredictable drift. Call it ProblemMap No.13.

Full ProblemMap Article Index

Day 13 – Multi-agent Chaos in AI Pipelines (ProblemMap No.13)

PSBigBig

Guest

Common Failure Patterns in Multi-Agent Pipelines​

ProblemMap Reference​

Quick 60-second Diagnostic​

Checklist for Diagnosis​

Minimal Fixes​

Hard Fixes for Production​

Guardrails from WFGY​

Tiny Sanity Script​

Acceptance Checks​

TL;DR​