§ cs-01 case study · cat clarence.md 2025 · xephyr · production

Clarence

A production agent knowledge platform — persistent, governed organisational memory for Claude Code agents, serving 12 client tenants over MCP.

tl;dr

Consulting work kept re-paying a ramp-up tax: every engagement rebuilt context that already existed somewhere in the firm. I architected and shipped a two-tier knowledge platform — a pgvector tier for semantic recall and a Graphiti/FalkorDB temporal knowledge graph for entities and dated facts — unified behind a fail-closed Agent-Key gateway and consumed by agents over MCP. Strict multi-tenant isolation across both tiers, proven by a credential-free authorization harness before every deploy.

12 client tenants
509 tests · 61 files
5 governed MCP tools
14 ADRs
1-command deploys
fail-closed gateway

C — Context

Xephyr is a data consultancy: many concurrent client engagements, each with its own stack, decisions, and history. Knowledge lived in silos — Slack threads, Confluence pages, GitHub PRs, people's heads — so every new project and every new agent session started from zero. Worse, the surface that was meant to fix this was reachable on the VPN without authentication.

The brief I set: give Claude Code agents persistent, governed organisational memory — where "governed" means a client's knowledge is structurally incapable of leaking to another client's agents.

R — Role

Architect and lead engineer, end to end: the architecture and its 14 ADRs, the TypeScript services, the AWS infrastructure as Terraform, the test discipline, and the platform's executive overview and roadmap — the record every future agent at the firm inherits.

A — Action

./architecture.txttwo tiers · one front door

        claude code agents ──── MCP ────┐
                                        ▼
┌──────────────────────────────────────────────────┐
│ agent-key gateway · the single front door        │
│ sha-256 token → verified principal               │
│ fail-closed: db outage ⇒ 503, never anonymous    │
└───────────┬─────────────────────────┬────────────┘
     edge authorization        json-rpc rewrite
            ▼                         ▼
┌──────────────────────┐  ┌──────────────────────┐
│ vector tier          │  │ graph tier           │
│ postgres + pgvector  │  │ graphiti + falkordb  │
│ titan v2 · 1024-dim  │  │ 10-entity ontology   │
│ hnsw cosine          │  │ temporal facts       │
└──────────▲───────────┘  └──────────▲───────────┘
           │     dual-write conveyor │
           └────────────┬────────────┘
   slack · confluence · github — content-hash sync,
   watermarks, email-keyed identity resolution

One front door. A fail-closed Agent-Key gateway authenticates an opaque bearer token (stored only as a SHA-256 hash, instant revoke) against Postgres on every request, stamps a verified principal header inward, strips the raw token, and overwrites any forged inbound principal. If the key store is down it returns 503 — an outage never downgrades to unauthenticated.
Five governed MCP tools, one envelope. Every tool registers through a single Tool Envelope adapter that runs parse-then-authorize tenant checks structurally before any handler — authorization cannot drift per-tool because there is only one path.
Isolation mechanism chosen by ownership. The in-house vector service enforces tenancy at the edge of every handler; the third-party Graphiti image we don't own is governed from outside via MCP JSON-RPC argument-rewriting — injecting and validating the tenant group on writes, intersecting reads with the allow-list, and denying unscopable calls fail-closed.
RAG enrichment with a deliberately split error contract. Entries are embedded with Amazon Titan V2 (1024-dim, HNSW) and metadata-tagged by Claude Haiku 4.5, both routed LiteLLM → AWS Bedrock for Australian data residency. Embedding failures throw (no unsearchable rows); metadata extraction never throws (capture still succeeds).
A 10-entity ontology that models the business, not its tools. Client, Project, Person, Decision, Risk… — a Jira issue or Slack thread is provenance on an episode, not a node. The type cap protects LLM extraction quality.
No-SSH operations. The full AWS platform is Terraform IaC; GitHub Actions assumes a role via OIDC and deploys over SSM in one command, with a guard that refuses any plan destroying live prod infrastructure.
Observability as a contract. Structured JSON logs to CloudWatch, correlated by W3C trace-id across in-house services and black-box images alike, PII-redacted — plus a durable Postgres audit log of every authorization decision.

I — Impact

In production serving 12 client tenants — per-project ramp-up replaced with persistent, governed memory.
The previously unauthenticated, VPN-reachable surface is sealed behind the gateway.
509 tests across 61 files, plus a credential-free authz harness that spins up a throwaway pgvector container and proves the entire tenant-isolation boundary over real HTTP — the 401 reason matrix, 503 fail-closed, principal-forgery defence, exact audit-row counts — before every deploy.
Deploys went from manual SSH work to a single one-command push.

G — Growth

The sharpest lesson came from a silent total ingestion outage: every graph write returned 421 Misdirected Request. Root cause — the graph server's DNS-rebinding guard required Host: localhost, and the Node SDK's default undici fetch silently drops a Host-header override. A green local run had passed everything while prod was broken. The fix was small; the lesson was structural: I added an in-process test that simulates the prod guard and a check:prod-parity deploy gate, so "passes locally" and "works in prod" can no longer diverge silently.

Tradeoffs

Two isolation mechanisms instead of one. Edge authorization where we own the code; JSON-RPC rewriting where we don't. Two paths to test — but the alternative was forking a third-party image and carrying that patch forever.
Fail-closed over available. A database outage takes knowledge access down (503) rather than open. For a multi-tenant trust boundary, availability is the right thing to sacrifice.
Type-capped ontology. Ten entity types is deliberately few — expressiveness traded for LLM extraction quality and tense/promotion rules that stay enforceable at ingest.
Dual-write, graph-primary. The conveyor writes each record into both tiers — pgvector is a near-free second sink — accepting write amplification for unified recall.
Human-gated cross-tenant distillation (ADR-0013). Knowledge recurring across engagements is promoted only through an approved Postgres review ledger — the only write path across the tenant boundary is a human-approved row. Throughput traded for making an unreviewed leak structurally impossible.

Limitations

The cross-engagement distillation pipeline is designed, not shipped — ADR-0013 is accepted; the leadership-only partition is not yet in the live roster.
Single-box runtime: a 6-service Docker Compose stack on one EC2 host. Right-sized for the workload, but there is no high-availability story yet.
This is proprietary client-platform work — the code is private, so this narrative (and the resume it adapts) is the public artifact. The fail-closed-auth and tenant-isolation concepts ARE demonstrated in public code: custodian-mcp — a synthetic-data companion repo with the gateway, tool envelope, isolation matrix, audit log, and a real-HTTP authz harness.