Six composable pieces. None of them load-bearing alone. The integration is the point. This is documentation of what worked through mid-2026 — sharing it as knowledge to pass on, with the honest expectation that parts of it will be obsolete a year from now as the AI tooling landscape moves.
User memory auto-loads every turn (~200 lines, ~30KB). Reference memory loads on demand. Session memory is scratch. Repo memory holds verified facts. A semantic vector brain handles fuzzy recall across tens of thousands of chunks.
A local open-weights model on a consumer GPU compresses verbose memory, distills brain recalls, generates cold-start briefs, and writes rolling summaries that pre-empt host-side compaction — all without spending a token of paid context. The expensive model thinks; the free model summarizes.
Nightly object-storage sync of every workspace byte. Time-based immutability blocks any delete, overwrite, even with the master key. Cool tier auto-tiers to Archive after 30 days. ~200 GB stored at well under a dollar a month.
Kill-switch sentinel + spend-gate script + cloud budget alert. Every billable script must clear all three before touching money. Default ceiling configurable. Failure mode = abort, never silently overspend.
Weekly snapshot to vault/brain/. Two newest kept hot. Plus a sealed .SEALED.zip on local disk with sentinel files preventing the agent from mistaking it for live data. Restoration: rename + expand, three steps.
Per-session journals + a one-line audit trail + the local-model distiller produce a ~3KB curated brief on demand. The successor agent reads it instead of guessing from a lossy auto-summary. Context survives compaction, restart, model swap.
Each pillar can run alone — but the leverage comes from the combination:
For the architecture diagrams of how the pieces wire together, see the architecture page. For an honest take on what each profile of hardware actually needs, see hardware reality.