Deep Research Pipeline
by @vardhineediganesh877-ui
Multi-stage deep research with reflection loops, multi-query retrieval, LLM chunk selection, and citation integrity. Use when: deep research, literature revi...
clawhub install deep-research-pipelineπ About This Skill
name: deep-research version: 2.0.1 description: "Multi-stage deep research with reflection loops, multi-query retrieval, LLM chunk selection, and citation integrity. Use when: deep research, literature review, topic investigation, multi-source analysis, fact-checking, competitive analysis, technology deep-dives."
Deep Research Pipeline
Deep Research Pipeline turns broad questions into cited, publication-quality reports through a staged research workflow: planning, multi-query retrieval, chunk selection, analysis, reflection, writing, and optional verification.
It is designed for research that should not be answered from memory or a single search result. The pipeline keeps claims tied to sources, surfaces contradictions, tracks gaps, and can resume from checkpoints.
Why Use It
LLM_API_KEY/LLM_API_BASE, OpenAI-compatible endpoints, or Z.AI GLM.When to Use
Deep research, comprehensive analysis, literature reviews, competitive analysis, fact-checking, technology deep-dives β anything needing multiple sources, synthesis, and verified citations.Quick Start
cd skills/deep-researchOptional: configure any OpenAI-compatible provider
export LLM_API_KEY="your-key"
export LLM_API_BASE="https://api.example.com/v1"
export LLM_MODEL="your-model"Or use OpenAI-compatible env names
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.example.com/v1"Run a report
python3 scripts/research_pipeline.py \
"Compare Vercel, Netlify, and Cloudflare Pages in 2026" \
--max-cycles 2 \
--format report \
--output report.mdTest without API calls
python3 scripts/research_pipeline.py "test question" --mock --output report.md
If no universal/OpenAI-compatible variables are set, the skill still supports Z.AI via ZAI_API_KEY and ZAI_API_ENDPOINT.
Architecture
ORCHESTRATOR (you)
β
βββ Plan β Decompose question into research dimensions
β
βββ REFLECTION LOOP (0-8 cycles)
β βββ Researcher Agent (parallel) β multi-query search + chunk selection
β βββ Analyst Agent β dedupe + themes + contradictions
β βββ Reflection β coverage check, gap analysis, continue decision
β
βββ Writer Agent β polished report (report/summary/brief/json)
β
βββ Verify (optional) β adversarial fact-check
Key principle: Orchestrator NEVER searches directly. Clean output flows between stages only.
Two Modes
Mode 1: Full Pipeline CLI (Recommended)
Use the enhancedresearch_pipeline.py for automated end-to-end research:# Full research with all features
python3 scripts/research_pipeline.py "What is the state of quantum computing in 2026?" \
--max-cycles 3 \
--output report.md \
--format reportMock mode (no API calls, for testing)
python3 scripts/research_pipeline.py "test question" --mock --output report.mdWith budget limits
python3 scripts/research_pipeline.py "question" \
--max-cycles 3 --time-limit 300 --token-limit 40000Resume from checkpoint
python3 scripts/research_pipeline.py "question" \
--resume checkpoint.json --output report.mdExplicit dimensions
python3 scripts/research_pipeline.py "question" \
--dimensions architecture benchmarks limitations \
--output report.md --format summary
CLI Flags:
| Flag | Default | Description |
|------|---------|-------------|
| --max-cycles | 3 | Max research cycles (1-8) |
| --mock | false | Use mock data, no API calls |
| --output / -o | stdout | Output file path |
| --format / -f | report | Output format: report, summary, brief, json |
| --time-limit | 900 | Max seconds for entire pipeline |
| --token-limit | 60000 | Max estimated tokens |
| --checkpoint | none | Save checkpoints to path |
| --resume | none | Resume from checkpoint file |
| --dimensions | auto | Explicit research dimensions |
| --no-parallel | false | Research dimensions sequentially |
Output formats:
report β Full markdown: Executive Summary β Key Findings β Detailed Analysis β Contradictions β Gaps β Sources β Methodologysummary β Executive summary + top 5 findings + sourcesbrief β Bullet-point format for quick scanningjson β Structured JSON with annotated findings and metadataMode 2: Orchestrated Sub-Agents (For complex research)
Use when you need fine-grained control over each stage or parallel dimension research with sub-agents.Workflow (Orchestrated Mode)
Phase 1: Planning
memory/research// directoryplan.mdPhase 2: Research Cycle (repeat up to 8 times)
#### Step A: Spawn Researcher Agent(s)
Use sessions_spawn with a task brief (NOT the full query):
{
"dimension": "technical architecture",
"specific_questions": ["How does X work?", "What are Y's components?"],
"context_limit": 5000,
"max_sources": 10
}
Researcher agent does:
1. Multi-query generation β scripts/query_generator.py produces 3-5 variants
2. Parallel search β web_search for each variant
3. Content fetching β web_fetch for top results
4. LLM chunk selection β scripts/chunk_selector.py scores each chunk (β₯0.7)
5. Context expansion β scripts/context_expander.py fetches surrounding content
6. Output: JSON findings with citations
Can spawn 2-3 researcher agents in parallel for different dimensions.
#### Step B: Spawn Analyst Agent After researcher(s) complete, spawn analyst with their combined output: 1. Deduplicate overlapping findings 2. Flag contradictions (explicit + implicit) 3. Group into thematic clusters 4. Identify gaps 5. Output: Cleaned JSON + gap list
#### Step C: Run Reflection
After analyst completes, run scripts/reflection.py:
1. What's covered? (themes + confidence scores)
2. What gaps remain? (unanswered questions)
3. What contradictions emerged?
4. New directions discovered?
5. Should continue? (coverage β₯ 0.8 + minor gaps β stop)
Save reflection to memory/research/
#### Continue Decision
Phase 3: Write Report
Use the Writer Agent (scripts/writer.py) for publication-quality output:# From Python
from writer import WriterAgent, OutputFormat, write_reportGenerate report using WriterAgent
agent = WriterAgent(use_llm=True)
result = agent.write_report(
analyst_output, # from analyst or run_analyst()
question="What is RAG?",
fmt=OutputFormat.REPORT,
)Or use convenience function
result = write_report(analyst_output, question, fmt="report")Save to file
from writer import save_report
save_report(result, "output/report.md")
Report features:
[source_url] inline citations throughoutPhase 4: Verify (optional sub-agent)
Spawn adversarial verifier:web_fetchreview.mdPhase 5: Deliver
final.mdprovenance.md (date, cycles, sources, verification status)Python API
import sys, os
sys.path.insert(0, os.path.expanduser("~/.openclaw/workspace/skills/deep-research/scripts"))from research_pipeline import run_enhanced_pipeline
result = run_enhanced_pipeline(
question="What is the state of quantum computing in 2026?",
max_cycles=3,
dimensions=["hardware", "algorithms", "applications", "challenges"],
mock_mode=False,
output_format="report",
time_limit=900,
token_limit=60000,
checkpoint_path="checkpoint.json", # auto-saves progress
parallel_dimensions=True, # parallel research per dimension
)
result["report"] = markdown string
result["cycles_completed"] = int
result["final_coverage"] = float (0.0-1.0)
result["metadata"] = dict with timing, findings count, etc.
Scripts
| Script | Purpose | Usage |
|--------|---------|-------|
| research_pipeline.py | Full pipeline orchestration | python3 scripts/research_pipeline.py "question" --max-cycles 3 |
| query_generator.py | Generate 3-5 search query variants | python3 scripts/query_generator.py -q "..." |
| chunk_selector.py | LLM scores chunks, filters by threshold | python3 scripts/chunk_selector.py -q "..." -c chunks.json |
| context_expander.py | Fetch surrounding context for incomplete chunks | python3 scripts/context_expander.py -s selected.json -q "..." |
| reflection.py | Mandatory gap/contradiction check | python3 scripts/reflection.py -q "..." -f findings.json -c 1 |
| writer.py | Publication-quality report generation | from writer import WriterAgent, write_report |
| analyst.py | Dedup + themes + contradictions (no API needed) | from analyst import analyze_findings |
| researcher.py | Multi-source research orchestration | from researcher import research, research_dimension |
| research_sources.py | Search adapters (web, GitHub, docs) | from research_sources import WebSearchSource |
| fact-checker.py | Claim extraction + source ranking | python3 scripts/fact-checker.py "text" --sources '["url1"]' |
All LLM-enabled scripts use the shared provider-agnostic llm_client.py.
Provider resolution order:
1. LLM_API_KEY + LLM_API_BASE + optional LLM_MODEL
2. OPENAI_API_KEY + OPENAI_API_BASE / OPENAI_BASE_URL + optional OPENAI_MODEL
3. ZAI_API_KEY + optional ZAI_API_ENDPOINT / GLM_MODEL
If no key is configured, use --mock for local pipeline testing or rely on scripts with rule-based fallbacks where available.
Examples
Example 1: Quick Competitive Analysis
python3 scripts/research_pipeline.py \
"Compare Vercel vs Netlify vs Cloudflare Pages features and pricing 2026" \
--max-cycles 2 \
--dimensions features pricing performance ecosystem \
--format summary \
--output competitive-analysis.md
Example 2: Deep Technology Research
python3 scripts/research_pipeline.py \
"What is the current state of AI agent frameworks?" \
--max-cycles 4 \
--time-limit 600 \
--token-limit 80000 \
--checkpoint /tmp/ai-agents-checkpoint.json \
--format report \
--output ai-agents-research.md
Example 3: Literature Review (mock mode for testing)
python3 scripts/research_pipeline.py \
"What does the research say about transformer architecture efficiency?" \
--mock \
--max-cycles 3 \
--format report \
--output literature-review.md
Example 4: Bullet Brief for Quick Scanning
python3 scripts/research_pipeline.py \
"What are the latest developments in Rust web frameworks?" \
--max-cycles 2 \
--format brief \
--output rust-web-brief.md
Example 5: JSON Output for Programmatic Use
python3 scripts/research_pipeline.py \
"What is the market size of edge computing?" \
--max-cycles 2 \
--format json \
--output edge-computing-data.json
Integration with Night Shift
To queue research plans for Night Shift execution:
1. Create a research plan file:
// memory/research/queued/.json
{
"question": "What is the state of quantum computing in 2026?",
"max_cycles": 3,
"dimensions": ["hardware", "algorithms", "applications"],
"output_format": "report",
"output_path": "memory/research/quantum-2026/final.md",
"time_limit": 600,
"created_at": "2026-04-25T06:00:00Z"
}
2. Night Shift picks up queued plans and runs them via:
python3 scripts/research_pipeline.py "$QUESTION" \
--max-cycles $MAX_CYCLES \
--dimensions $DIMENSIONS \
--format $FORMAT \
--output $OUTPUT_PATH \
--time-limit $TIME_LIMIT
3. Results are saved to memory/research/ with provenance metadata.
File Layout
memory/research//
βββ plan.md # Research plan with dimensions
βββ reflection-cycle-1.md # Reflection after each cycle
βββ reflection-cycle-2.md
βββ researcher-output-*.json # Raw researcher findings
βββ analyst-output.json # Merged/deduped findings
βββ draft.md # First draft
βββ brief.md # Verified brief
βββ review.md # Adversarial review (optional)
βββ final.md # Final report
βββ provenance.md # Metadata + source verification status
βββ checkpoint.json # Pipeline checkpoint (auto-saved)
Quick Mode
Skip sub-agents and the full pipeline. Do 5-10 searches yourself. Still use evidence tables, verify URLs, cite sources. Shorter, inline in chat.Integrity Commandments
1. Never fabricate a source β no URL = don't mention it 2. Never claim existence without checking 3. Never extrapolate unread details 4. Read before summarizing 5. No fake certainty β never say "verified" unless checked 6. Never invent numbers/benchmarks/comparisons 7. Separate observations from inferences 8. Every claim traces to a source β citation integrity is mandatory 9. Reflection is not optional β run it after every cycle 10. Stage separation β orchestrator never searches, researchers never see full planScale Decision
See Also
β‘ When to Use
Deep research, comprehensive analysis, literature reviews, competitive analysis, fact-checking, technology deep-dives β anything needing multiple sources, synthesis, and verified citations.
π‘ Examples
Example 1: Quick Competitive Analysis
python3 scripts/research_pipeline.py \
"Compare Vercel vs Netlify vs Cloudflare Pages features and pricing 2026" \
--max-cycles 2 \
--dimensions features pricing performance ecosystem \
--format summary \
--output competitive-analysis.md
Example 2: Deep Technology Research
python3 scripts/research_pipeline.py \
"What is the current state of AI agent frameworks?" \
--max-cycles 4 \
--time-limit 600 \
--token-limit 80000 \
--checkpoint /tmp/ai-agents-checkpoint.json \
--format report \
--output ai-agents-research.md
Example 3: Literature Review (mock mode for testing)
python3 scripts/research_pipeline.py \
"What does the research say about transformer architecture efficiency?" \
--mock \
--max-cycles 3 \
--format report \
--output literature-review.md
Example 4: Bullet Brief for Quick Scanning
python3 scripts/research_pipeline.py \
"What are the latest developments in Rust web frameworks?" \
--max-cycles 2 \
--format brief \
--output rust-web-brief.md
Example 5: JSON Output for Programmatic Use
python3 scripts/research_pipeline.py \
"What is the market size of edge computing?" \
--max-cycles 2 \
--format json \
--output edge-computing-data.json