🎁 Get the FREE AI Skills Starter Guide β€” Subscribe β†’
BytesAgainBytesAgain
πŸ¦€ ClawHub

Autoresearch Agent

by @alirezarezvani

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed e...

Versionv2.1.1
Installs2
βš™οΈ Configuration

First Time β€” Create the Experiment

Run the setup script. The user decides where experiments live:

Project-level (inside repo, git-tracked, shareable with team):

python scripts/setup_experiment.py \
  --domain engineering \
  --name api-speed \
  --target src/api/search.py \
  --eval "pytest bench.py --tb=no -q" \
  --metric p50_ms \
  --direction lower \
  --scope project

User-level (personal, in ~/.autoresearch/):

python scripts/setup_experiment.py \
  --domain marketing \
  --name medium-ctr \
  --target content/titles.md \
  --eval "python evaluate.py" \
  --metric ctr_score \
  --direction higher \
  --evaluator llm_judge_content \
  --scope user

The --scope flag determines where .autoresearch/ lives:

  • project (default) β†’ .autoresearch/ in the repo root. Experiment definitions are git-tracked. Results are gitignored.
  • user β†’ ~/.autoresearch/ in the home directory. Everything is personal.
  • What Setup Creates

    .autoresearch/
    β”œβ”€β”€ config.yaml                        ← Global settings
    β”œβ”€β”€ .gitignore                         ← Ignores results.tsv, *.log
    └── {domain}/{experiment-name}/
        β”œβ”€β”€ program.md                     ← Objectives, constraints, strategy
        β”œβ”€β”€ config.cfg                     ← Target, eval cmd, metric, direction
        β”œβ”€β”€ results.tsv                    ← Experiment log (gitignored)
        └── evaluate.py                    ← Evaluation script (if --evaluator used)
    

    results.tsv columns: commit | metric | status | description

  • commit β€” short git hash
  • metric β€” float value or "N/A" for crashes
  • status β€” keep | discard | crash
  • description β€” what changed or why it crashed
  • Domains

    | Domain | Use Cases | |--------|-----------| | engineering | Code speed, memory, bundle size, test pass rate, build time | | marketing | Headlines, social copy, email subjects, ad copy, engagement | | content | Article structure, SEO descriptions, readability, CTR | | prompts | System prompts, chatbot tone, agent instructions | | custom | Anything else with a measurable metric |

    If program.md Already Exists

    The user may have written their own program.md. If found in the experiment directory, read it. It overrides the template. Only ask for what's missing.


    πŸ”’ Constraints

  • One change per experiment. Don't change 5 things at once. You won't know what worked.
  • Simplicity criterion. A small improvement that adds ugly complexity is not worth it. Equal performance with simpler code is a win. Removing code that gets same results is the best outcome.
  • Never modify the evaluator. evaluate.py is the ground truth. Modifying it invalidates all comparisons. Hard stop if you catch yourself doing this.
  • Timeout. If a run exceeds 2.5Γ— the time budget, kill it and treat as crash.
  • Crash handling. If it's a typo or missing import, fix and re-run. If the idea is fundamentally broken, revert, log "crash", move on. 5 consecutive crashes β†’ pause and alert.
  • No new dependencies. Only use what's already available in the project.

  • View on ClawHub
    TERMINAL
    clawhub install autoresearch-agent

    πŸ§ͺ Use this skill with your agent

    Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

    πŸ” Can't find the right skill?

    Search 60,000+ AI agent skills β€” free, no login needed.

    Search Skills β†’