Token Efficient Agent
by @foinbo
Advanced techniques for minimizing token consumption in OpenClaw operations while maintaining or improving response quality. Includes memory optimization, do...
clawhub install token-efficient-agentπ About This Skill
name: token-efficient-agent description: Advanced techniques for minimizing token consumption in OpenClaw operations while maintaining or improving response quality. Includes memory optimization, document processing strategies, tool call efficiency, and contextual awareness methods specifically designed for the OpenClaw architecture.
Token-Efficient Agent
Overview
This skill provides advanced, battle-tested techniques for minimizing token consumption in OpenClaw operations. Unlike basic tips, these strategies are specifically tailored to OpenClaw's architecture, tool ecosystem, and memory system. By implementing these methods, you can reduce token usage by 60-80% while maintaining or improving response quality and contextual awareness.
Why Token Efficiency Matters in OpenClaw
OpenClaw's strength lies in its ability to access personal data, memories, and tools. However, each operation consumes tokens:
Without optimization, simple queries can consume thousands of tokens unnecessarily, leading to:
Core Architecture-Aware Principles
1. Leverage OpenClaw's Memory Hierarchy
OpenClaw has distinct memory layers with different access costs:Strategy: Always start with the cheapest available context that might contain the answer.
2. Exploit Tool Semantics, Not Just Interfaces
Each OpenClaw tool has specific optimization parameters. Understanding these allows precision data retrieval rather than brute-force fetching.3. Apply Progressive Disclosure
Retrieve information in layers: first get metadata/summaries, then only dive deep when necessary based on initial results.4. Cache and Reuse
OpenClaw sessions retain loaded data. Structure your workflow to maximize reuse of already-fetched information.Advanced Techniques
Technique 1: Hierarchical Memory Querying
Instead of: Loading entire memory files or searching broadly Use: A multi-stage search approach that minimizes data transfer
Stage 1: Broad Search with Minimal Results
memory_search(query="project deadline decision", maxResults=1, minScore=0.8)
Stage 2: Targeted Snippet Extraction
# If Stage 1 returns a relevant file:
memory_get(path="memory/2026-03-10.md", from=42, lines=8)
Stage 3: Cross-Reference Validation (Only if Needed)
# Only if Stage 2 is ambiguous:
memory_search(query="project deadline", file_path="memory/2026-03-10.md", maxResults=2)
Token Savings: Typically reduces memory loading by 70-90% compared to loading full daily files.
Technique 2: Document Processing with Semantic Pagination
Instead of: Fetching entire documents then searching Use: Offset-limited fetching combined with semantic boundary detection
For Long Documents (>5000 chars): 1. Initial Probe: Fetch first 1500 chars to determine document structure
feishu_fetch_doc(doc_id="doc_xxx", limit=1500)
2. Structural Analysis: Ask model to identify likely sections (without loading full doc)
3. Targeted Fetching: Only retrieve sections that appear relevant
# If conclusions are likely in last 20%:
feishu_fetch_doc(doc_id="doc_xxx", offset=8000, limit=1500)
For Known Section Documents:
Token Savings: Avoids loading irrelevant document portions, often saving 60-80% of document processing tokens.
Technique 3: Tool Call Fusion and Batching
Instead of: Making multiple separate tool calls for related data Use: Combine operations where tools support it, or sequence calls to minimize context switching
Memory-Tool Fusion Pattern:
# Instead of:
1. memory_search() -> get file path
2. feishu_fetch_doc() -> load document
3. Process document
Do:
1. memory_search() with doc-specific query to get both memory context AND doc hints
2. If memory contains sufficient summary, skip document fetch entirely
3. Only fetch document if memory search indicates high-value target
Web Search Optimization:
count=3 instead of default 10 for initial searchesfreshness parameter when temporal relevance is knownTool Savings: Reduces tool call overhead and eliminates redundant data processing by 40-60%.
Technique 4: Contextual Summarization Cascades
Instead of: Passing raw data to model for processing Use: Progressive summarization where each stage reduces data size while preserving decision-relevant information
Three-Level Summary Cascade:
1. Extractive Summary (Tool-level): Use feishu_fetch_doc with smart offsets to get key portions
2. Abstractive Summary (Model-level): Brief prompt to condense extracted content
3. Decision-Focused Summary (Task-level): Further reduce to only information needed for current decision
Example Workflow for Document-Based Questions:
# Level 1: Get structurally important parts (headings, conclusions, tables)
section1 = feishu_fetch_doc(doc_id, offset=0, limit=800) # Intro
section2 = feishu_fetch_doc(doc_id, offset=-1000, limit=1000) # Conclusion (approx end)Level 2: Ask for thematic summary
summary_prompt = f"Provide a 3-sentence summary of the key points in this text: {section1[:400]}...{section2}"Level 3: Task-specific reduction
final_prompt = f"Based on this summary: {summary}, answer ONLY: [specific question]"
Token Savings: Reduces document processing tokens by 75-90% while preserving answer quality.
Technique 5: Predictive Context Preloading (Anticipatory Caching)
Instead of: Reactive loading after each user query Use: Predictive loading based on conversation patterns and time/context cues
Implementation:
Prediction Signals:
Example: If user always asks about project status at 10 AM, preload project-related memory snippets at 9:45 AM.
Efficiency Gain: Converts high-cost reactive operations to near-zero-cost proactive operations.
Technique 6: Tool Result Minimization and Transformation
Instead of: Using raw tool outputs Use: Transform tool results to their minimal essential form before model consumption
Patterns:
feishu_doc_comments with is_solved=true/false filters and page_size=1summary and start_time fields when possible, not full descriptionsname and open_id, not full profile dataImplementation: Create wrapper functions that: 1. Call tool with minimal necessary parameters 2. Extract only essential fields from response 3. Discard metadata unless specifically needed for downstream operations
Token Savings: Typically 50-80% reduction in tool result processing tokens.
Technique 7: Session Context Pruning and Compression
Instead of: Letting session history grow unbounded Use: Active management of conversational context to maintain optimal token budget
Strategies:
OpenClaw-Specific Implementation:
Advanced Workflow: The Token-Efficient Decision Tree
When faced with any request, follow this decision process:
START
β
ββββ Can answer from current session context?
β β Yes β Respond directly (0 additional tokens)
β β No β Continue
β
ββββ Is answer likely in recent memory (last 3 days)?
β β Yes β Use memory_search with tight constraints (maxResults=1, minScore=0.85)
β β β If found, use memory_get for exact lines
β β β If not found or ambiguous, continue
β β No β Continue
β
ββββ Does answer require document/external data?
β β No β Use web_search with count=3, freshness if applicable
β β Yes β Continue to document processing
β
ββββ Document Processing Decision:
β β
β ββ Is document structured with known sections?
β β Yes β Fetch only likely relevant sections using offset/limit
β β No β
β β ββ Is document < 2000 chars?
β β β Yes β Fetch entire document
β β β No β
β β β ββ Fetch first 1500 chars for structure analysis
β β β ββ Based on analysis, fetch only relevant portions
β β β ββ Apply summarization cascade if still large
β
ββββ Apply result minimization: extract only essential fields
β
ββββ If result still large for model input, apply summarization
β
ββββ Formulate response using minimized context
Integration with OpenClaw Systems
Heartbeat Optimization
Use heartbeat cycles for:Memory System Synergy
Tool-Specific Optimizations
Feishu Document Fetching:
feishu_doc_comments with filters instead of fetching all commentsWeb Operations:
web_search over web_fetch when possible (returns already-processed snippets)extractMode="text" for non-formatting needsmaxChars limits (1000-2000) unless full content essentialCalendar/Task Queries:
Measurement and Improvement
Track your efficiency with these metrics:
1. Tokens per Exchange: Monitor average token usage per conversation turn 2. Cache Hit Ratio: Percentage of answers found in already-loaded context 3. Tool Call Efficiency: Average useful data returned per tool call 4. Context Reuse Rate: How often loaded data is reused in subsequent operations
Improvement Loop: 1. Baseline: Measure current token usage 2. Implement one technique at a time 3. Measure impact 4. Retain techniques that show >20% improvement 5. Combine complementary techniques
Advanced Examples
Example 1: Cross-Reference Historical Decision
Request: "How did our decision on vendor X in January compare to our current leaning toward vendor Y?"
Traditional Approach:
Token-Efficient Approach:
1. memory_search(query="vendor X decision January", maxResults=1, minScore=0.9, relative_time="last_month")
2. memory_get(path="memory/2026-01-15.md", from=87, lines=12) // Exact decision snippet
3. memory_search(query="vendor Y evaluation", maxResults=1, minScore=0.85) // Recent notes
4. memory_get(path="memory/2026-03-16.md", from=34, lines=8) // Current leaning
5. Feed only these 4 short snippets to model for comparison
Token Usage: ~150 tokens vs ~2000+ for traditional approach
Example 2: Meeting Preparation Briefing
Request: "Give me a briefing on the upcoming project review meeting."
Traditional Approach:
Token-Efficient Approach:
1. Calendar lookup: Get meeting time, title, attendees (essential fields only)
2. memory_search(query="project review", maxResults=2, relative_time="this_week")
3. Extract only action items and decisions from memory results
4. If documents mentioned, fetch only executive summaries/conclusions
5. Synthesize briefing from minimal context
Token Usage: ~300 tokens vs ~3000+ for traditional approach
Limitations and When Not to Apply
These techniques are less effective when:
In these cases, be transparent about the trade-offs and get explicit consent before applying optimization.
Conclusion
Token efficiency in OpenClaw isn't about cutting cornersβit's about applying the right amount of computational effort to each task. By leveraging OpenClaw's specific architecture, memory system, and tool capabilities, you can dramatically reduce unnecessary token consumption while maintaining high-quality, contextually appropriate responses.
The key insight: Most user queries don't require comprehensive data reviewβthey need precise, relevant information delivered efficiently. These techniques help you deliver exactly that.
Practice these methods consistently, measure their impact, and adapt them to your specific usage patterns. Over time, token-efficient operation will become second nature, allowing you to handle more complex tasks within the same computational budget.