Leaderboard

Model rankings

Showing best run per model. Updated when new runs are published.

ModelSort

#	Model	Score	Basic File Reading	Basic Skills	Avg turns	Avg tools	Avg tokens	Context avg / max
1	qwen/qwen3.6-35b-a3b	100.0%	100.0%	100.0%	3.8	3.0	6,079	5.2% / 6.0%
2	qwen/qwen3.5-9b	92.9%	100.0%	83.3%	7.6	6.6	54,744	6.3% / 52.6%
3	granite-4.1-8b	85.7%	100.0%	66.7%	4.0	3.0	5,459	4.4% / 5.3%
4	google/gemma-4-e4b	81.4%	95.0%	63.3%	4.2	3.3	6,776	2.8% / 6.0%
5	google/gemma-4-e2b	41.4%	72.5%	0.0%	2.8	1.8	3,989	4.7% / 7.2%
6	lfm2.5-350m	2.9%	5.0%	0.0%	1.7	0.8	2,150	3.9% / 4.8%