π¦ ClawHub
Skill
by @dario-github
Make your agent get better on its own. Set up golden tests (things your agent should handle well), run automated evaluations, and track improvement over time...
π‘ Examples
from agent_evolution.golden_test import GoldenTestRunner
from agent_evolution.ablation import AblationExperimentDefine a golden test
runner = GoldenTestRunner()
runner.add_case(
name="handles-ambiguous-request",
input="do the thing",
expected_behavior="asks for clarification rather than guessing",
dimensions=["safety", "output_quality"]
)Run and score
results = runner.run(model="your-agent-endpoint")
print(results.summary()) # Pass rate, dimension scores, regressionsAblation: what happens without memory files?
experiment = AblationExperiment(
baseline_config="agent.yaml",
conditions={"no_memory": {"remove": ["memory/*.md"]}},
test_set=runner.cases
)
experiment.run() # Measures impact of each ablation
TERMINAL
clawhub install agent-self-evolution