⚡ Cost Optimization

The 8-Layer Stack
to Cut Token Costs by 97%

Default OpenClaw burns $50–150/week. These 8 optimizations stack together to get you under $5. Most take under 5 minutes.

$150/mo
Default Config
$10/mo
Optimized Stack

The Complete Stack at a Glance

Each layer targets a different cost driver. The order matters — start at Layer 1 for the biggest instant savings.

#LayerSavingsTime
1Kill Thinking Mode10–50x per call30 seconds
2Cap Context Window20–40%1 config line
3Model Routing50–80%5 min config
4Session Reset Discipline40–60%Behavioral
5Lean Session Init80% on startupPrompt rule
6Heartbeat → Ollama100% on heartbeats10 min setup
7Prompt Caching90% on static contentConfig flag
8Subagent Isolation3.5x → 1x on multi-agentWorkflow change
01
Kill Thinking Mode
10–50x reduction 30 seconds

Extended thinking is the single biggest hidden cost in OpenClaw. When enabled, every API call generates thousands of “reasoning” tokens before the actual response. Most routine tasks don’t need it.

This one optimization alone can save more than all the others combined. If you only do one thing from this guide, do this.

Option A — Disable in config:

~/.openclaw/openclaw.json
{ "thinking": { "type": "disabled" } }

Option B — Disable mid-session:

/reasoning off
When to re-enable: Architecture decisions, complex debugging, multi-step planning. Use /reasoning on temporarily, then turn it back off.

Before

  • Every call generates reasoning tokens
  • Simple file reads cost 10–50x more
  • $2–5 burned on routine tasks daily

After

  • Reasoning only when explicitly enabled
  • File reads, status checks cost pennies
  • $0.10–0.50 on routine tasks daily
02
Cap Context Window
20–40% reduction 1 config line

OpenClaw can load up to 200K+ tokens of context per call. Most tasks need a fraction of that. Hard-capping the context window prevents bloat.

~/.openclaw/openclaw.json
{ "contextTokens": 50000 }
50K is the sweet spot. Enough for complex multi-file tasks, cheap enough to not blow your budget. Drop to 25K for simple automation agents.

A 100K context costs 10x more than a 10K context. Every call. Every time. This one line puts a ceiling on it.

03
Model Routing — Haiku Default
50–80% reduction 5 min config

Out of the box, OpenClaw uses Sonnet for everything. Sonnet is excellent but overkill for checking file status, running commands, or routine monitoring. Haiku handles these perfectly at 12x less cost.

~/.openclaw/openclaw.json
{ "agents": { "defaults": { "model": { "primary": "anthropic/claude-haiku-4-5" }, "models": { "anthropic/claude-sonnet-4-5": { "alias": "sonnet" }, "anthropic/claude-haiku-4-5": { "alias": "haiku" } } } } }

Add this routing rule to your system prompt:

System Prompt Rule
MODEL SELECTION RULE: Default: Always use Haiku Switch to Sonnet ONLY when: - Architecture decisions - Production code review - Security analysis - Complex debugging / reasoning - Strategic multi-project decisions When in doubt: Try Haiku first.

Before

  • Sonnet for everything
  • ~$0.003 per 1K tokens
  • $50–70/month on models

After

  • Haiku by default, Sonnet on demand
  • ~$0.00025 per 1K tokens
  • $5–10/month on models

Free & Ultra-Cheap Alternatives

You’re not limited to Anthropic models. The landscape has expanded:

ModelPrice / 1M TokensBest For
Gemini FlashFree tier availableHeartbeats, classification, simple routing
DeepSeek V3$0.53Code generation, reasoning at fraction of cost
Llama 3.2 (Ollama)Free (local)Heartbeats, status checks, simple tasks
Kimi K2.5 (Nvidia)Free (Nvidia API)Content drafting, first drafts
Claude Haiku$0.25General default for 80% of tasks
Claude Sonnet$3.00Architecture, security, complex reasoning
Power user strategy: Use Opus/Sonnet for initial strategy, route first drafts to Kimi (free via Nvidia), and use Haiku for everything else. Three tiers, one config.
04
Session Reset Discipline
40–60% reduction Behavioral

Long conversations accumulate tokens. A session that started at 8KB can balloon to 500KB+ after 35 messages. Every subsequent call re-sends the entire history.

Diagnose bloated sessions:

du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -h

Anything over 500KB is costing you money on every single call.

Fix it:

# Check current usage /status # Compact the session (keeps context, reduces tokens) /compact # Or start fresh openclaw session new
Rule of thumb: Compact every 30 minutes or after each major task — whichever comes first. This single habit prevents 90% of session bloat.
05
Lean Session Initialization
80% on startup Prompt rule

Default behavior: load 50KB of history on every message. Optimized: load only 8KB of what actually matters.

System Prompt Rule — Paste This
SESSION INITIALIZATION RULE: On every session start: 1. Load ONLY these files: - SOUL.md - USER.md - IDENTITY.md - memory/YYYY-MM-DD.md (if it exists) 2. DO NOT auto-load: - MEMORY.md - Session history - Prior messages - Previous tool outputs 3. When user asks about prior context: - Use memory_search() on demand - Pull only the relevant snippet - Don't load the whole file 4. Update memory/YYYY-MM-DD.md at end of session with: - What you worked on - Decisions made - Blockers - Next steps

Before

  • 50KB context on startup
  • 2–3M tokens wasted per session
  • $0.40 per session
  • History bloat over time

After

  • 8KB context on startup
  • Only loads what’s needed
  • $0.05 per session
  • Clean daily memory files
06
Heartbeat → Ollama (Free)
100% on heartbeats 10 min setup

Heartbeats run every few minutes to check if your agent is alive. By default, these hit your paid API. Route them to a free local model instead.

Step 1: Install Ollama & pull a lightweight model

# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull a small model (2GB, fast) ollama pull llama3.2:3b

Step 2: Configure heartbeat routing

~/.openclaw/openclaw.json
{ "heartbeat": { "every": "1h", "model": "ollama/llama3.2:3b", "session": "main", "target": "slack", "prompt": "Check: Any blockers, opportunities, or progress updates needed?" } }

Step 3: Verify

ollama serve # In another terminal: ollama run llama3.2:3b "respond with OK"
Cost impact: If heartbeat was running every minute on the paid API, that’s 1,440 API calls/day — $5–15/month just for “are you alive?” checks. Now it’s $0.
07
Prompt Caching
90% on static content Config flag

Your system prompt, SOUL.md, and reference materials get sent with every API call. Prompt caching charges only 10% for reused content within a 5-minute window.

~/.openclaw/openclaw.json
{ "agents": { "defaults": { "cache": { "enabled": true, "ttl": "5m", "priority": "high" }, "models": { "anthropic/claude-sonnet-4-5": { "alias": "sonnet", "cache": true }, "anthropic/claude-haiku-4-5": { "alias": "haiku", "cache": false } } } } }

What to cache vs. what not to:

Cache These (Stable)

  • System prompts (SOUL.md)
  • User context (USER.md)
  • Tool documentation (TOOLS.md)
  • Reference materials & specs
  • Project templates

Don’t Cache (Dynamic)

  • Daily memory files
  • Recent user messages
  • Tool outputs
  • MEMORY.md (changes frequently)
Maximize cache hits: Batch requests within 5-minute windows. Keep system prompts stable — don’t update SOUL.md mid-session. Changes invalidate cache.

Monitor cache effectiveness:

openclaw shell session_status # Look for: Cache hits: 45/50 (90%) # Target: > 80% hit rate
08
Subagent Isolation
3.5x → 1x Workflow change

Multi-agent coordination is expensive — 3.5x more tokens than single-agent setups. The fix: isolate sub-tasks with /spawn so each agent runs in a clean context.

# Instead of asking your main agent to do everything: /spawn researcher "Find the top 5 competitors in this space" # The spawned agent gets a clean context # Your main agent's context stays lean
This is critical for the Three Roles architecture. Your Architect stays lightweight. Managers coordinate. Workers execute in isolation. Each role gets only the context it needs — not the combined history of everything.

Also: scope tool access per agent.

Reduce prompt overhead per agent
{ "tools": { "subagents": { "tools": { "allow": ["read", "exec", "process", "write", "edit", "apply_patch"] } } } }

Every tool in the allowlist adds tokens to the prompt. A researcher doesn’t need deployment tools. A writer doesn’t need database access. Strip tools to the minimum each agent actually needs.

Rate Limits & Budget Guardrails

Even with all 8 layers, runaway automation can still burn tokens. Add these guardrails to your system prompt:

System Prompt Rule — Paste This
RATE LIMITS: - 5 seconds minimum between API calls - 10 seconds between web searches - Max 5 searches per batch, then 2-minute break - Batch similar work (one request for 10 leads, not 10 requests) - If you hit 429 error: STOP, wait 5 minutes, retry DAILY BUDGET: $5 (warning at 75%) MONTHLY BUDGET: $200 (warning at 75%)
OpenClaw has no built-in hard spend caps. These prompt-level rules are your insurance policy. Without them, a single research loop can burn $20+ overnight.

All 8 Layers Stacked

Each layer multiplies the benefit of the others. Here’s the cumulative effect per API call:

Layer AppliedCost Per CallCumulative Savings
Baseline (no optimization)$0.47
+ Kill Thinking Mode$0.1274%
+ Cap Context Window$0.0883%
+ Model Routing (Haiku)$0.0589%
+ Session Reset$0.0394%
+ Lean Session Init$0.0296%
+ Heartbeat → Ollama$0.0296%
+ Prompt Caching$0.01597%
+ Subagent Isolation$0.01297.4%

Implementation Checklist

Print this. Do them in order. Check them off.

Signs It’s Working

Something’s Wrong If...

  • Context size shows 50KB+ on /status
  • Default model shows Sonnet everywhere
  • Heartbeat still hitting paid API
  • Daily costs above $1.00
  • Sessions over 500KB on disk

You’re Good If...

  • Context shows 2–8KB on startup
  • Default model shows Haiku
  • Heartbeat shows Ollama/local
  • Daily costs $0.10–0.50
  • Cache hit rate above 80%