MOORING Bench

A reliability benchmark for small and tiny models on local agent harnesses. Currently tracking 6 model runs across 7 test cases.

Top models

See the full leaderboard