Clarence
A production agent knowledge platform — persistent, governed organisational memory for Claude Code agents, serving 12 client tenants over MCP.
Consulting work kept re-paying a ramp-up tax: every engagement rebuilt context that already existed somewhere in the firm. I architected and shipped a two-tier knowledge platform — a pgvector tier for semantic recall and a Graphiti/FalkorDB temporal knowledge graph for entities and dated facts — unified behind a fail-closed Agent-Key gateway and consumed by agents over MCP. Strict multi-tenant isolation across both tiers, proven by a credential-free authorization harness before every deploy.
- 12 client tenants
- 509 tests · 61 files
- 5 governed MCP tools
- 14 ADRs
- 1-command deploys
- fail-closed gateway
C — Context
Xephyr is a data consultancy: many concurrent client engagements, each with its own stack, decisions, and history. Knowledge lived in silos — Slack threads, Confluence pages, GitHub PRs, people's heads — so every new project and every new agent session started from zero. Worse, the surface that was meant to fix this was reachable on the VPN without authentication.
The brief I set: give Claude Code agents persistent, governed organisational memory — where "governed" means a client's knowledge is structurally incapable of leaking to another client's agents.
R — Role
Architect and lead engineer, end to end: the architecture and its 14 ADRs, the TypeScript services, the AWS infrastructure as Terraform, the test discipline, and the platform's executive overview and roadmap — the record every future agent at the firm inherits.
A — Action
claude code agents ──── MCP ────┐
▼
┌──────────────────────────────────────────────────┐
│ agent-key gateway · the single front door │
│ sha-256 token → verified principal │
│ fail-closed: db outage ⇒ 503, never anonymous │
└───────────┬─────────────────────────┬────────────┘
edge authorization json-rpc rewrite
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ vector tier │ │ graph tier │
│ postgres + pgvector │ │ graphiti + falkordb │
│ titan v2 · 1024-dim │ │ 10-entity ontology │
│ hnsw cosine │ │ temporal facts │
└──────────▲───────────┘ └──────────▲───────────┘
│ dual-write conveyor │
└────────────┬────────────┘
slack · confluence · github — content-hash sync,
watermarks, email-keyed identity resolution
- One front door. A fail-closed Agent-Key gateway authenticates an opaque bearer token (stored only as a SHA-256 hash, instant revoke) against Postgres on every request, stamps a verified principal header inward, strips the raw token, and overwrites any forged inbound principal. If the key store is down it returns 503 — an outage never downgrades to unauthenticated.
- Five governed MCP tools, one envelope. Every tool registers through a single Tool Envelope adapter that runs parse-then-authorize tenant checks structurally before any handler — authorization cannot drift per-tool because there is only one path.
- Isolation mechanism chosen by ownership. The in-house vector service enforces tenancy at the edge of every handler; the third-party Graphiti image we don't own is governed from outside via MCP JSON-RPC argument-rewriting — injecting and validating the tenant group on writes, intersecting reads with the allow-list, and denying unscopable calls fail-closed.
- RAG enrichment with a deliberately split error contract. Entries are embedded with Amazon Titan V2 (1024-dim, HNSW) and metadata-tagged by Claude Haiku 4.5, both routed LiteLLM → AWS Bedrock for Australian data residency. Embedding failures throw (no unsearchable rows); metadata extraction never throws (capture still succeeds).
- A 10-entity ontology that models the business, not its tools. Client, Project, Person, Decision, Risk… — a Jira issue or Slack thread is provenance on an episode, not a node. The type cap protects LLM extraction quality.
- No-SSH operations. The full AWS platform is Terraform IaC; GitHub Actions assumes a role via OIDC and deploys over SSM in one command, with a guard that refuses any plan destroying live prod infrastructure.
- Observability as a contract. Structured JSON logs to CloudWatch, correlated by W3C trace-id across in-house services and black-box images alike, PII-redacted — plus a durable Postgres audit log of every authorization decision.
I — Impact
- In production serving 12 client tenants — per-project ramp-up replaced with persistent, governed memory.
- The previously unauthenticated, VPN-reachable surface is sealed behind the gateway.
- 509 tests across 61 files, plus a credential-free authz harness that spins up a throwaway pgvector container and proves the entire tenant-isolation boundary over real HTTP — the 401 reason matrix, 503 fail-closed, principal-forgery defence, exact audit-row counts — before every deploy.
- Deploys went from manual SSH work to a single one-command push.
G — Growth
The sharpest lesson came from a silent total ingestion outage: every graph write
returned 421 Misdirected Request. Root cause — the graph server's DNS-rebinding guard
required Host: localhost, and the Node SDK's default undici fetch silently
drops a Host-header override. A green local run had passed everything while prod was
broken. The fix was small; the lesson was structural: I added an in-process test that
simulates the prod guard and a check:prod-parity deploy gate, so "passes
locally" and "works in prod" can no longer diverge silently.
Tradeoffs
- Two isolation mechanisms instead of one. Edge authorization where we own the code; JSON-RPC rewriting where we don't. Two paths to test — but the alternative was forking a third-party image and carrying that patch forever.
- Fail-closed over available. A database outage takes knowledge access down (503) rather than open. For a multi-tenant trust boundary, availability is the right thing to sacrifice.
- Type-capped ontology. Ten entity types is deliberately few — expressiveness traded for LLM extraction quality and tense/promotion rules that stay enforceable at ingest.
- Dual-write, graph-primary. The conveyor writes each record into both tiers — pgvector is a near-free second sink — accepting write amplification for unified recall.
- Human-gated cross-tenant distillation (ADR-0013). Knowledge recurring across engagements is promoted only through an approved Postgres review ledger — the only write path across the tenant boundary is a human-approved row. Throughput traded for making an unreviewed leak structurally impossible.
Limitations
- The cross-engagement distillation pipeline is designed, not shipped — ADR-0013 is accepted; the leadership-only partition is not yet in the live roster.
- Single-box runtime: a 6-service Docker Compose stack on one EC2 host. Right-sized for the workload, but there is no high-availability story yet.
- This is proprietary client-platform work — the code is private, so this narrative (and the resume it adapts) is the public artifact. The fail-closed-auth and tenant-isolation concepts ARE demonstrated in public code: custodian-mcp — a synthetic-data companion repo with the gateway, tool envelope, isolation matrix, audit log, and a real-HTTP authz harness.