Case

use-skill

Success rate 55.0% across all recorded iterations.

Total
60
Passed
33
Failed
27
Errors
0

Model Comparison

#ModelScoreBasic File ReadingBasic SkillsAvg turnsAvg toolsAvg tokensContext avg / max
1qwen/qwen3.6-35b-a3b100.0%n/a100.0%3.02.04,6534.9% / 5.0%
2granite-4.1-8b100.0%n/a100.0%3.02.04,0454.3% / 4.3%
3qwen/qwen3.5-9b70.0%n/a70.0%23.422.4331,05210.6% / 52.6%
4google/gemma-4-e4b60.0%n/a60.0%3.12.44,9802.7% / 5.2%
5google/gemma-4-e2b0.0%n/a0.0%2.01.02,7374.4% / 5.3%
6lfm2.5-350m0.0%n/a0.0%1.20.21,5273.9% / 4.0%

Iterations

60 matching iterations

IterationModelCategoryVariantStatusDurationToolsTokensContext
use-skill / 001lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed301ms01,2994.0%
use-skill / 002lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed248ms01,2783.9%
use-skill / 003lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed234ms12,5223.9%
use-skill / 004lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed213ms01,2653.9%
use-skill / 005lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed219ms01,2673.9%
use-skill / 006lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed242ms01,2783.9%
use-skill / 007lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed302ms12,5554.0%
use-skill / 008lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed235ms01,2703.9%
use-skill / 009lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed204ms01,2583.8%
use-skill / 010lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed262ms01,2763.9%
use-skill / 001lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed1,459ms24,0454.3%
use-skill / 002lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed964ms24,0454.3%
use-skill / 003lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed987ms24,0454.3%
use-skill / 004lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed980ms24,0454.3%
use-skill / 005lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed981ms24,0454.3%
use-skill / 006lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed975ms24,0454.3%
use-skill / 007lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed987ms24,0454.3%
use-skill / 008lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed979ms24,0454.3%
use-skill / 009lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed966ms24,0454.3%
use-skill / 010lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)passed964ms24,0454.3%
use-skill / 001lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed4,584ms13,0375.3%
use-skill / 002lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed2,510ms12,7544.5%
use-skill / 003lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,426ms12,5884.0%
use-skill / 004lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed2,528ms12,7384.5%
use-skill / 005lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed3,857ms12,9505.1%
use-skill / 006lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed472ms12,7004.2%
use-skill / 007lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,854ms12,7124.2%
use-skill / 008lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,457ms12,6024.0%
use-skill / 009lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,618ms12,6524.1%
use-skill / 010lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,592ms12,6394.1%
use-skill / 001lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed2,252ms23,8322.1%
use-skill / 002lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)failed9,076ms36,6593.2%
use-skill / 003lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed2,593ms23,9092.2%
use-skill / 004lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed2,865ms24,0172.2%
use-skill / 005lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)failed14,226ms37,3663.9%
use-skill / 006lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)failed2,340ms12,5622.1%
use-skill / 007lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed2,559ms23,9142.2%
use-skill / 008lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)failed26,552ms59,4855.2%
use-skill / 009lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed2,375ms23,8862.1%
use-skill / 010lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed2,734ms24,1672.4%
use-skill / 001lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed8,971ms24,6585.0%
use-skill / 002lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed9,159ms24,6715.0%
use-skill / 003lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed9,685ms24,6625.0%
use-skill / 004lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed8,684ms24,6825.0%
use-skill / 005lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed8,604ms24,6464.9%
use-skill / 006lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed9,676ms24,6374.9%
use-skill / 007lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed8,800ms24,6284.9%
use-skill / 008lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed9,609ms24,6484.9%
use-skill / 009lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed10,513ms24,6274.9%
use-skill / 010lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed9,366ms24,6705.0%
use-skill / 001lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed2,866ms24,7245.0%
use-skill / 002lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)failed399,910ms1903,229,06452.6%
use-skill / 003lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed2,607ms24,7255.0%
use-skill / 004lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed1,979ms24,6745.0%
use-skill / 005lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)failed22,314ms1231,39610.9%
use-skill / 006lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed2,683ms24,7895.1%
use-skill / 007lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)failed9,388ms817,1117.2%
use-skill / 008lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed2,045ms24,6735.0%
use-skill / 009lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed2,566ms24,6715.0%
use-skill / 010lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed3,055ms24,6965.0%