Leaderboard
Model rankings
Showing best run per model. Updated when new runs are published.
| # | Model | Score | Basic File Reading | Basic Skills | Avg turns | Avg tools | Avg tokens | Context avg / max |
|---|---|---|---|---|---|---|---|---|
| 1 | qwen/qwen3.6-35b-a3b | 100.0% | 100.0% | 100.0% | 3.8 | 3.0 | 6,079 | 5.2% / 6.0% |
| 2 | qwen/qwen3.5-9b | 92.9% | 100.0% | 83.3% | 7.6 | 6.6 | 54,744 | 6.3% / 52.6% |
| 3 | granite-4.1-8b | 85.7% | 100.0% | 66.7% | 4.0 | 3.0 | 5,459 | 4.4% / 5.3% |
| 4 | google/gemma-4-e4b | 81.4% | 95.0% | 63.3% | 4.2 | 3.3 | 6,776 | 2.8% / 6.0% |
| 5 | google/gemma-4-e2b | 41.4% | 72.5% | 0.0% | 2.8 | 1.8 | 3,989 | 4.7% / 7.2% |
| 6 | lfm2.5-350m | 2.9% | 5.0% | 0.0% | 1.7 | 0.8 | 2,150 | 3.9% / 4.8% |