Token Optimization — The 8-Layer Stack to Cut OpenClaw Costs by 97%

$150/mo

Default Config

→

$10/mo

Optimized Stack

Overview

The Complete Stack at a Glance

Each layer targets a different cost driver. The order matters — start at Layer 1 for the biggest instant savings.

#	Layer	Savings	Time
1	Kill Thinking Mode	10–50x per call	30 seconds
2	Cap Context Window	20–40%	1 config line
3	Model Routing	50–80%	5 min config
4	Session Reset Discipline	40–60%	Behavioral
5	Lean Session Init	80% on startup	Prompt rule
6	Heartbeat → Ollama	100% on heartbeats	10 min setup
7	Prompt Caching	90% on static content	Config flag
8	Subagent Isolation	3.5x → 1x on multi-agent	Workflow change

Kill Thinking Mode

10–50x reduction 30 seconds

Extended thinking is the single biggest hidden cost in OpenClaw. When enabled, every API call generates thousands of “reasoning” tokens before the actual response. Most routine tasks don’t need it.

This one optimization alone can save more than all the others combined. If you only do one thing from this guide, do this.

Option A — Disable in config:

~/.openclaw/openclaw.json

{ "thinking": { "type": "disabled" } }

Option B — Disable mid-session:

/reasoning off

When to re-enable: Architecture decisions, complex debugging, multi-step planning. Use /reasoning on temporarily, then turn it back off.

Before

Every call generates reasoning tokens
Simple file reads cost 10–50x more
$2–5 burned on routine tasks daily

After

Reasoning only when explicitly enabled
File reads, status checks cost pennies
$0.10–0.50 on routine tasks daily

Cap Context Window

20–40% reduction 1 config line

OpenClaw can load up to 200K+ tokens of context per call. Most tasks need a fraction of that. Hard-capping the context window prevents bloat.

~/.openclaw/openclaw.json

{ "contextTokens": 50000 }

50K is the sweet spot. Enough for complex multi-file tasks, cheap enough to not blow your budget. Drop to 25K for simple automation agents.

A 100K context costs 10x more than a 10K context. Every call. Every time. This one line puts a ceiling on it.

Model Routing — Haiku Default

50–80% reduction 5 min config

Out of the box, OpenClaw uses Sonnet for everything. Sonnet is excellent but overkill for checking file status, running commands, or routine monitoring. Haiku handles these perfectly at 12x less cost.

~/.openclaw/openclaw.json

{ "agents": { "defaults": { "model": { "primary": "anthropic/claude-haiku-4-5" }, "models": { "anthropic/claude-sonnet-4-5": { "alias": "sonnet" }, "anthropic/claude-haiku-4-5": { "alias": "haiku" } } } } }

Add this routing rule to your system prompt:

System Prompt Rule

MODEL SELECTION RULE: Default: Always use Haiku Switch to Sonnet ONLY when: - Architecture decisions - Production code review - Security analysis - Complex debugging / reasoning - Strategic multi-project decisions When in doubt: Try Haiku first.

Before

Sonnet for everything
~$0.003 per 1K tokens
$50–70/month on models

After

Haiku by default, Sonnet on demand
~$0.00025 per 1K tokens
$5–10/month on models

Free & Ultra-Cheap Alternatives

You’re not limited to Anthropic models. The landscape has expanded:

Model	Price / 1M Tokens	Best For
Gemini Flash	Free tier available	Heartbeats, classification, simple routing
DeepSeek V3	$0.53	Code generation, reasoning at fraction of cost
Llama 3.2 (Ollama)	Free (local)	Heartbeats, status checks, simple tasks
Kimi K2.5 (Nvidia)	Free (Nvidia API)	Content drafting, first drafts
Claude Haiku	$0.25	General default for 80% of tasks
Claude Sonnet	$3.00	Architecture, security, complex reasoning

Power user strategy: Use Opus/Sonnet for initial strategy, route first drafts to Kimi (free via Nvidia), and use Haiku for everything else. Three tiers, one config.

Session Reset Discipline

40–60% reduction Behavioral

Long conversations accumulate tokens. A session that started at 8KB can balloon to 500KB+ after 35 messages. Every subsequent call re-sends the entire history.

Diagnose bloated sessions:

du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -h

Anything over 500KB is costing you money on every single call.

Fix it:

# Check current usage /status # Compact the session (keeps context, reduces tokens) /compact # Or start fresh openclaw session new

Rule of thumb: Compact every 30 minutes or after each major task — whichever comes first. This single habit prevents 90% of session bloat.

Lean Session Initialization

80% on startup Prompt rule

Default behavior: load 50KB of history on every message. Optimized: load only 8KB of what actually matters.

System Prompt Rule — Paste This

SESSION INITIALIZATION RULE: On every session start: 1. Load ONLY these files: - SOUL.md - USER.md - IDENTITY.md - memory/YYYY-MM-DD.md (if it exists) 2. DO NOT auto-load: - MEMORY.md - Session history - Prior messages - Previous tool outputs 3. When user asks about prior context: - Use memory_search() on demand - Pull only the relevant snippet - Don't load the whole file 4. Update memory/YYYY-MM-DD.md at end of session with: - What you worked on - Decisions made - Blockers - Next steps

Before

50KB context on startup
2–3M tokens wasted per session
$0.40 per session
History bloat over time

After

8KB context on startup
Only loads what’s needed
$0.05 per session
Clean daily memory files

Heartbeat → Ollama (Free)

100% on heartbeats 10 min setup

Heartbeats run every few minutes to check if your agent is alive. By default, these hit your paid API. Route them to a free local model instead.

Step 1: Install Ollama & pull a lightweight model

# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull a small model (2GB, fast) ollama pull llama3.2:3b

Step 2: Configure heartbeat routing

~/.openclaw/openclaw.json

{ "heartbeat": { "every": "1h", "model": "ollama/llama3.2:3b", "session": "main", "target": "slack", "prompt": "Check: Any blockers, opportunities, or progress updates needed?" } }

Step 3: Verify

ollama serve # In another terminal: ollama run llama3.2:3b "respond with OK"

Cost impact: If heartbeat was running every minute on the paid API, that’s 1,440 API calls/day — $5–15/month just for “are you alive?” checks. Now it’s $0.

Prompt Caching

90% on static content Config flag

Your system prompt, SOUL.md, and reference materials get sent with every API call. Prompt caching charges only 10% for reused content within a 5-minute window.

~/.openclaw/openclaw.json

{ "agents": { "defaults": { "cache": { "enabled": true, "ttl": "5m", "priority": "high" }, "models": { "anthropic/claude-sonnet-4-5": { "alias": "sonnet", "cache": true }, "anthropic/claude-haiku-4-5": { "alias": "haiku", "cache": false } } } } }

What to cache vs. what not to:

Cache These (Stable)

System prompts (SOUL.md)
User context (USER.md)
Tool documentation (TOOLS.md)
Reference materials & specs
Project templates

Don’t Cache (Dynamic)

Daily memory files
Recent user messages
Tool outputs
MEMORY.md (changes frequently)

Maximize cache hits: Batch requests within 5-minute windows. Keep system prompts stable — don’t update SOUL.md mid-session. Changes invalidate cache.

Monitor cache effectiveness:

openclaw shell session_status # Look for: Cache hits: 45/50 (90%) # Target: > 80% hit rate

Subagent Isolation

3.5x → 1x Workflow change

Multi-agent coordination is expensive — 3.5x more tokens than single-agent setups. The fix: isolate sub-tasks with /spawn so each agent runs in a clean context.

# Instead of asking your main agent to do everything: /spawn researcher "Find the top 5 competitors in this space" # The spawned agent gets a clean context # Your main agent's context stays lean

This is critical for the Three Roles architecture. Your Architect stays lightweight. Managers coordinate. Workers execute in isolation. Each role gets only the context it needs — not the combined history of everything.

Also: scope tool access per agent.

Reduce prompt overhead per agent

{ "tools": { "subagents": { "tools": { "allow": ["read", "exec", "process", "write", "edit", "apply_patch"] } } } }

Every tool in the allowlist adds tokens to the prompt. A researcher doesn’t need deployment tools. A writer doesn’t need database access. Strip tools to the minimum each agent actually needs.

Safety Net

Rate Limits & Budget Guardrails

Even with all 8 layers, runaway automation can still burn tokens. Add these guardrails to your system prompt:

System Prompt Rule — Paste This

RATE LIMITS: - 5 seconds minimum between API calls - 10 seconds between web searches - Max 5 searches per batch, then 2-minute break - Batch similar work (one request for 10 leads, not 10 requests) - If you hit 429 error: STOP, wait 5 minutes, retry DAILY BUDGET: $5 (warning at 75%) MONTHLY BUDGET: $200 (warning at 75%)

OpenClaw has no built-in hard spend caps. These prompt-level rules are your insurance policy. Without them, a single research loop can burn $20+ overnight.

Combined Impact

All 8 Layers Stacked

Each layer multiplies the benefit of the others. Here’s the cumulative effect per API call:

Layer Applied	Cost Per Call	Cumulative Savings
Baseline (no optimization)	$0.47	—
+ Kill Thinking Mode	$0.12	74%
+ Cap Context Window	$0.08	83%
+ Model Routing (Haiku)	$0.05	89%
+ Session Reset	$0.03	94%
+ Lean Session Init	$0.02	96%
+ Heartbeat → Ollama	$0.02	96%
+ Prompt Caching	$0.015	97%
+ Subagent Isolation	$0.012	97.4%

Quick Reference

Implementation Checklist

Print this. Do them in order. Check them off.

Disabled thinking mode in config ("thinking": {"type": "disabled"})
Set context cap ("contextTokens": 50000)
Set Haiku as default model in openclaw.json
Added MODEL SELECTION RULE to system prompt
Added SESSION INITIALIZATION RULE to system prompt
Added RATE LIMITS to system prompt
Installed Ollama and pulled llama3.2:3b
Configured heartbeat to route to Ollama
Enabled prompt caching for Sonnet
Scoped tool allowlists per agent role
Verified with session_status — context under 10KB, model shows Haiku

Verify

Signs It’s Working

Something’s Wrong If...

Context size shows 50KB+ on /status
Default model shows Sonnet everywhere
Heartbeat still hitting paid API
Daily costs above $1.00
Sessions over 500KB on disk

You’re Good If...

Context shows 2–8KB on startup
Default model shows Haiku
Heartbeat shows Ollama/local
Daily costs $0.10–0.50
Cache hit rate above 80%