What it actually is

A vector database fronted by an MCP server. Plain Python, plain HTTP, plain Server-Sent Events. The reference implementation runs Chroma inside a small container; the MCP layer exposes a handful of tools the agent can call.

Storage

Chroma vector DB

One persistent client. A frozen read collection (curated corpus, baked into the image) plus a sidecar write collection (lazy-created at first remember()). Same DB file, no schema migration to add memories.

Surface

MCP server over HTTP/SSE

One container, one URL, one auth header. Any MCP-aware agent can connect: VS Code, Claude Desktop, Cursor, custom harnesses. No per-seat database file. No drift between teammates.

Hosting

Local Docker or cloud container app

Same image works on a laptop with docker compose up or in Azure Container Apps / Cloud Run / Fly / wherever. Persistent volume for the Chroma file is the only stateful piece.

The tool surface

The agent doesn't query SQL. It calls these like any other tool:

# fuzzy semantic recall across the whole corpus + every memory ever written
recall(query="how does the auth refresh flow handle expired tokens", n=8)

# write a durable memory; goes into the sidecar collection, queryable forever
remember(text="index migration must run before the schema validator...",
         source="S-20260507-deploy-pipeline",
         tags=["deploy", "migrations"])

# lightweight liveness + recent-activity ping at session start
pulse()

# save working state mid-session so the next agent can pick up
checkpoint(summary="...short status...")

# write a structured reflection after a thorny investigation
reflect(topic="why the cache layer kept returning stale rows", lessons=[...])

Exact tool names and signatures live in the reference repo. The shape stays the same across implementations: read fuzzily, write durably, ping cheaply.

What goes in it

Curated corpus
Domain documentation
SDK references, API specs, agency rule sets, forum scrapes — whatever the agent needs to answer questions in your domain without re-googling.
frozen read collection
Codebase chunks
Optional. Code split into semantic chunks at build time so the agent can fuzzy-find prior implementations.
frozen read collection
External corpora
Anything you ingest once and never touch again — RFCs, standards, vendor whitepapers, previous-employer documentation you're allowed to keep.
frozen read collection
Live memory
Audit-trail rows
Every closed session writes one row. Tagged with session id + topic. Becomes searchable history of every decision ever made.
sidecar write collection
Session journals
Each REVIVAL doc gets ingested at close. The agent who picks up tomorrow can recall("last time we touched X") and get the dossier.
sidecar write collection
Ad-hoc remember() calls
Whenever the agent finds something worth keeping mid-session — a gotcha, an anti-pattern, a pricing footgun — it lands here immediately.
sidecar write collection

How it's different from /memories/

Both are persistent. Both survive between sessions. They serve different jobs:

 /memories/ filesThe brain
Access Exact text. The agent grep's, reads, edits. Fuzzy semantic. The agent asks a question, gets the closest N chunks back.
Size Hundreds of KB. Hand-curated. Auto-loaded into context (the top tier). Tens of thousands of chunks. Never loaded whole. Queried just-in-time.
Edit shape The agent rewrites files in place. Diffs visible in git. Append-only by convention. Old memories aren't deleted, just outvoted by newer relevant ones.
Job "What do I always need to know?" — routing rules, preferences, top-of-mind facts. "Has anyone seen this before?" — cross-session, cross-project, cross-domain recall.
Failure mode Files get long. Agent gets distracted. Triage with the local-model compressor. Stale memories slowly drift. Worst case = a recall pulls something that's a year out of date. The audit-trail in /memories/ stays canonical.

Rule of thumb: if you'd want to load it every turn, it's a memory. If you want to recall it later, that's a brain entry. Memory is triage. Brain is the library.

Networked pattern

The brain is the only Agent OS pillar that's a service rather than a file pattern. That's deliberate. Once it's a service, every agent on every machine can hit the same recall surface — no per-seat duplication, no re-embedding, no drift between teammates.

The architecture detail of how a multi-agent setup shares one brain — frozen+sidecar merge at query time, coordination primitives, audit-trail-as-ground-truth — lives on the architecture page. This page is about what's in the brain. That page is about how it scales.

Cost & backup

A small container app instance + a few hundred MB of vector data costs roughly $0.30/month in cloud compute on the documented setup. The vault pillar handles backup: weekly snapshot of the Chroma file to vault/brain/, two newest kept hot, plus a sealed local .SEALED.zip with sentinel files preventing the agent from mistaking it for live data.

Restoration is three steps: rename, expand, point the container at the restored volume. See architecture for the cold-storage tier diagram and hardware reality for the cost breakdown.

Why this matters

Without the brain, the agent's long-term memory is whatever fits in the auto-loaded files. That works for routing rules and preferences. It does not work for "we hit this exact bug eight months ago, here's what we tried, here's what finally worked."

With the brain, the agent walks into every session with access to every prior session's findings, every piece of vendor documentation you ever ingested, every anti-pattern you ever flagged. Cold-start stops being cold. Repeated mistakes stop repeating. The compounding effect is the entire reason this layer exists.