Tens of thousands of chunks of fuzzy-searchable knowledge the agent never has to load whole. The piece of Agent OS that turns "did we figure this out before?" into a sub-second answer.
A vector database fronted by an MCP server. Plain Python, plain HTTP, plain Server-Sent Events. The reference implementation runs Chroma inside a small container; the MCP layer exposes a handful of tools the agent can call.
One persistent client. A frozen read collection (curated corpus, baked into the image) plus a sidecar write collection (lazy-created at first remember()). Same DB file, no schema migration to add memories.
One container, one URL, one auth header. Any MCP-aware agent can connect: VS Code, Claude Desktop, Cursor, custom harnesses. No per-seat database file. No drift between teammates.
Same image works on a laptop with docker compose up or in Azure Container Apps / Cloud Run / Fly / wherever. Persistent volume for the Chroma file is the only stateful piece.
The agent doesn't query SQL. It calls these like any other tool:
# fuzzy semantic recall across the whole corpus + every memory ever written recall(query="how does the auth refresh flow handle expired tokens", n=8) # write a durable memory; goes into the sidecar collection, queryable forever remember(text="index migration must run before the schema validator...", source="S-20260507-deploy-pipeline", tags=["deploy", "migrations"]) # lightweight liveness + recent-activity ping at session start pulse() # save working state mid-session so the next agent can pick up checkpoint(summary="...short status...") # write a structured reflection after a thorny investigation reflect(topic="why the cache layer kept returning stale rows", lessons=[...])
Exact tool names and signatures live in the reference repo. The shape stays the same across implementations: read fuzzily, write durably, ping cheaply.
recall("last time we touched X") and get the dossier.remember() calls/memories/Both are persistent. Both survive between sessions. They serve different jobs:
| /memories/ files | The brain | |
|---|---|---|
| Access | Exact text. The agent grep's, reads, edits. | Fuzzy semantic. The agent asks a question, gets the closest N chunks back. |
| Size | Hundreds of KB. Hand-curated. Auto-loaded into context (the top tier). | Tens of thousands of chunks. Never loaded whole. Queried just-in-time. |
| Edit shape | The agent rewrites files in place. Diffs visible in git. | Append-only by convention. Old memories aren't deleted, just outvoted by newer relevant ones. |
| Job | "What do I always need to know?" — routing rules, preferences, top-of-mind facts. | "Has anyone seen this before?" — cross-session, cross-project, cross-domain recall. |
| Failure mode | Files get long. Agent gets distracted. Triage with the local-model compressor. | Stale memories slowly drift. Worst case = a recall pulls something that's a year out of date. The audit-trail in /memories/ stays canonical. |
Rule of thumb: if you'd want to load it every turn, it's a memory. If you want to recall it later, that's a brain entry. Memory is triage. Brain is the library.
The brain is the only Agent OS pillar that's a service rather than a file pattern. That's deliberate. Once it's a service, every agent on every machine can hit the same recall surface — no per-seat duplication, no re-embedding, no drift between teammates.
The architecture detail of how a multi-agent setup shares one brain — frozen+sidecar merge at query time, coordination primitives, audit-trail-as-ground-truth — lives on the architecture page. This page is about what's in the brain. That page is about how it scales.
A small container app instance + a few hundred MB of vector data costs roughly $0.30/month in cloud compute on the documented setup. The vault pillar handles backup: weekly snapshot of the Chroma file to vault/brain/, two newest kept hot, plus a sealed local .SEALED.zip with sentinel files preventing the agent from mistaking it for live data.
Restoration is three steps: rename, expand, point the container at the restored volume. See architecture for the cold-storage tier diagram and hardware reality for the cost breakdown.
Without the brain, the agent's long-term memory is whatever fits in the auto-loaded files. That works for routing rules and preferences. It does not work for "we hit this exact bug eight months ago, here's what we tried, here's what finally worked."
With the brain, the agent walks into every session with access to every prior session's findings, every piece of vendor documentation you ever ingested, every anti-pattern you ever flagged. Cold-start stops being cold. Repeated mistakes stop repeating. The compounding effect is the entire reason this layer exists.