Default OpenClaw burns $50–150/week. These 8 optimizations stack together to get you under $5. Most take under 5 minutes.
Each layer targets a different cost driver. The order matters — start at Layer 1 for the biggest instant savings.
| # | Layer | Savings | Time |
|---|---|---|---|
| 1 | Kill Thinking Mode | 10–50x per call | 30 seconds |
| 2 | Cap Context Window | 20–40% | 1 config line |
| 3 | Model Routing | 50–80% | 5 min config |
| 4 | Session Reset Discipline | 40–60% | Behavioral |
| 5 | Lean Session Init | 80% on startup | Prompt rule |
| 6 | Heartbeat → Ollama | 100% on heartbeats | 10 min setup |
| 7 | Prompt Caching | 90% on static content | Config flag |
| 8 | Subagent Isolation | 3.5x → 1x on multi-agent | Workflow change |
Extended thinking is the single biggest hidden cost in OpenClaw. When enabled, every API call generates thousands of “reasoning” tokens before the actual response. Most routine tasks don’t need it.
Option A — Disable in config:
Option B — Disable mid-session:
/reasoning on temporarily, then turn it back off.
OpenClaw can load up to 200K+ tokens of context per call. Most tasks need a fraction of that. Hard-capping the context window prevents bloat.
A 100K context costs 10x more than a 10K context. Every call. Every time. This one line puts a ceiling on it.
Out of the box, OpenClaw uses Sonnet for everything. Sonnet is excellent but overkill for checking file status, running commands, or routine monitoring. Haiku handles these perfectly at 12x less cost.
Add this routing rule to your system prompt:
You’re not limited to Anthropic models. The landscape has expanded:
| Model | Price / 1M Tokens | Best For |
|---|---|---|
| Gemini Flash | Free tier available | Heartbeats, classification, simple routing |
| DeepSeek V3 | $0.53 | Code generation, reasoning at fraction of cost |
| Llama 3.2 (Ollama) | Free (local) | Heartbeats, status checks, simple tasks |
| Kimi K2.5 (Nvidia) | Free (Nvidia API) | Content drafting, first drafts |
| Claude Haiku | $0.25 | General default for 80% of tasks |
| Claude Sonnet | $3.00 | Architecture, security, complex reasoning |
Long conversations accumulate tokens. A session that started at 8KB can balloon to 500KB+ after 35 messages. Every subsequent call re-sends the entire history.
Diagnose bloated sessions:
Anything over 500KB is costing you money on every single call.
Fix it:
Default behavior: load 50KB of history on every message. Optimized: load only 8KB of what actually matters.
Heartbeats run every few minutes to check if your agent is alive. By default, these hit your paid API. Route them to a free local model instead.
Step 1: Install Ollama & pull a lightweight model
Step 2: Configure heartbeat routing
Step 3: Verify
Your system prompt, SOUL.md, and reference materials get sent with every API call. Prompt caching charges only 10% for reused content within a 5-minute window.
What to cache vs. what not to:
Monitor cache effectiveness:
Multi-agent coordination is expensive — 3.5x more tokens than single-agent setups. The fix: isolate sub-tasks with /spawn so each agent runs in a clean context.
Also: scope tool access per agent.
Every tool in the allowlist adds tokens to the prompt. A researcher doesn’t need deployment tools. A writer doesn’t need database access. Strip tools to the minimum each agent actually needs.
Even with all 8 layers, runaway automation can still burn tokens. Add these guardrails to your system prompt:
Each layer multiplies the benefit of the others. Here’s the cumulative effect per API call:
| Layer Applied | Cost Per Call | Cumulative Savings |
|---|---|---|
| Baseline (no optimization) | $0.47 | — |
| + Kill Thinking Mode | $0.12 | 74% |
| + Cap Context Window | $0.08 | 83% |
| + Model Routing (Haiku) | $0.05 | 89% |
| + Session Reset | $0.03 | 94% |
| + Lean Session Init | $0.02 | 96% |
| + Heartbeat → Ollama | $0.02 | 96% |
| + Prompt Caching | $0.015 | 97% |
| + Subagent Isolation | $0.012 | 97.4% |
Print this. Do them in order. Check them off.
"thinking": {"type": "disabled"})"contextTokens": 50000)openclaw.jsonllama3.2:3bsession_status — context under 10KB, model shows Haiku/status