Methodology

Scoring rules

MOORING reports static results from local agent harness artifacts without a database or live backend.

V1 Score

successRate = passedIterations / totalCompletedIterations

Iteration status is determined by the final assertion. Tool errors and assistant errors are diagnostic evidence for audit views, but they do not override a passing assertion because recovery from expected tool failures is part of the benchmark behavior.

Leaderboard Tie-breakers

  1. Higher success rate.
  2. Fewer errors.
  3. Lower average duration.
  4. Lower average total tokens.
  5. Lower average turns.