// zacplischka ~/work/clarence cd ~
§ cs-01 case study · cat clarence.md 2025 · xephyr · production

Clarence

A production agent knowledge platform — persistent, governed organisational memory for Claude Code agents, serving 12 client tenants over MCP.

tl;dr

Consulting work kept re-paying a ramp-up tax: every engagement rebuilt context that already existed somewhere in the firm. I architected and shipped a two-tier knowledge platform — a pgvector tier for semantic recall and a Graphiti/FalkorDB temporal knowledge graph for entities and dated facts — unified behind a fail-closed Agent-Key gateway and consumed by agents over MCP. Strict multi-tenant isolation across both tiers, proven by a credential-free authorization harness before every deploy.

C — Context

Xephyr is a data consultancy: many concurrent client engagements, each with its own stack, decisions, and history. Knowledge lived in silos — Slack threads, Confluence pages, GitHub PRs, people's heads — so every new project and every new agent session started from zero. Worse, the surface that was meant to fix this was reachable on the VPN without authentication.

The brief I set: give Claude Code agents persistent, governed organisational memory — where "governed" means a client's knowledge is structurally incapable of leaking to another client's agents.

R — Role

Architect and lead engineer, end to end: the architecture and its 14 ADRs, the TypeScript services, the AWS infrastructure as Terraform, the test discipline, and the platform's executive overview and roadmap — the record every future agent at the firm inherits.

A — Action

./architecture.txttwo tiers · one front door
        claude code agents ──── MCP ────┐
                                        ▼
┌──────────────────────────────────────────────────┐
│ agent-key gateway · the single front door        │
│ sha-256 token → verified principal               │
│ fail-closed: db outage ⇒ 503, never anonymous    │
└───────────┬─────────────────────────┬────────────┘
     edge authorization        json-rpc rewrite
            ▼                         ▼
┌──────────────────────┐  ┌──────────────────────┐
│ vector tier          │  │ graph tier           │
│ postgres + pgvector  │  │ graphiti + falkordb  │
│ titan v2 · 1024-dim  │  │ 10-entity ontology   │
│ hnsw cosine          │  │ temporal facts       │
└──────────▲───────────┘  └──────────▲───────────┘
           │     dual-write conveyor │
           └────────────┬────────────┘
   slack · confluence · github — content-hash sync,
   watermarks, email-keyed identity resolution
    

I — Impact

G — Growth

The sharpest lesson came from a silent total ingestion outage: every graph write returned 421 Misdirected Request. Root cause — the graph server's DNS-rebinding guard required Host: localhost, and the Node SDK's default undici fetch silently drops a Host-header override. A green local run had passed everything while prod was broken. The fix was small; the lesson was structural: I added an in-process test that simulates the prod guard and a check:prod-parity deploy gate, so "passes locally" and "works in prod" can no longer diverge silently.

Tradeoffs

Limitations