Methodology
Scoring rules
MOORING reports static results from local agent harness artifacts without a database or live backend.
V1 Score
successRate = passedIterations / totalCompletedIterations
Iteration status is determined by the final assertion. Tool errors and assistant errors are diagnostic evidence for audit views, but they do not override a passing assertion because recovery from expected tool failures is part of the benchmark behavior.
Leaderboard Tie-breakers
- Higher success rate.
- Fewer errors.
- Lower average duration.
- Lower average total tokens.
- Lower average turns.