The local-compute pillar carries weight, and "consumer GPU" hides a wide range. Here's what to expect at each profile, and what to do if you don't have one.
Everything in Agent OS besides local-model distillation (memory tiers, vault, guardrails, audit-trail, continuity discipline) works on any machine that runs your editor. The table below is specifically about the distillation pillar.
| Profile | Local model | Distill speed | Verdict |
|---|---|---|---|
| Workstation, 16 GB+ VRAM RTX 4080/5080/5090, M3/M4 Max, similar |
14B coder model, full quality | ~3–15 sec / summary | Reference-implementation experience. |
| Mid-range laptop, 8 GB VRAM RTX 3060/4060, M2 Pro, etc. |
7B coder model, q4 quantized | ~10–30 sec / summary | Workable. Summaries shorter, slightly noisier. |
| Integrated graphics no discrete GPU |
3B model on CPU, or hosted small model | ~30–90 sec / summary | Slow but functional. Or skip local entirely and use a cheap hosted model — the spend gate keeps it bounded. |
| Phone-class compute | — | — | Not the target. The pattern needs some available compute for distillation. |
The five non-compute pillars don't care about the GPU. If you can run an editor and reach object storage, you can run Memory + Vault + Guardrails + Brain backup + Continuity unchanged. Local compute is the cherry on top, not the load-bearing wall.
Reference implementation, on the documented setup, has run for months at well under $1/month in cloud spend. The line items:
The reference spend-gate script ships with a $25/month ceiling as the default circuit breaker. It's not a target spend or a recommendation — it's a number that makes a runaway script hit a wall instead of a billing email. Replace it with whatever number you want; the gate just needs to fail-closed before it touches money.