Case

use-skill-with-scripts

Success rate 46.7% across all recorded iterations.

Total
60
Passed
28
Failed
32
Errors
0

Model Comparison

#ModelScoreBasic File ReadingBasic SkillsAvg turnsAvg toolsAvg tokensContext avg / max
1qwen/qwen3.5-9b100.0%n/a100.0%4.83.98,8886.1% / 7.2%
2qwen/qwen3.6-35b-a3b100.0%n/a100.0%4.13.17,2355.8% / 6.0%
3google/gemma-4-e4b80.0%n/a80.0%5.94.912,1543.9% / 6.0%
4google/gemma-4-e2b0.0%n/a0.0%2.01.02,9084.7% / 5.1%
5granite-4.1-8b0.0%n/a0.0%4.13.16,4555.3% / 5.3%
6lfm2.5-350m0.0%n/a0.0%1.30.31,8834.4% / 4.8%

Iterations

60 matching iterations

IterationModelCategoryVariantStatusDurationToolsTokensContext
use-skill-with-scripts / 001lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed361ms12,8504.5%
use-skill-with-scripts / 002lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed294ms01,4474.4%
use-skill-with-scripts / 003lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed291ms01,4524.4%
use-skill-with-scripts / 004lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed224ms01,4204.3%
use-skill-with-scripts / 005lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed294ms01,4474.4%
use-skill-with-scripts / 006lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed241ms01,4234.3%
use-skill-with-scripts / 007lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed340ms01,4684.5%
use-skill-with-scripts / 008lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed207ms01,4194.3%
use-skill-with-scripts / 009lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed487ms24,5004.8%
use-skill-with-scripts / 010lm-studio / lfm2.5-350mBasic SkillsBaseline (/skills)failed199ms01,4084.3%
use-skill-with-scripts / 001lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed3,072ms36,3205.3%
use-skill-with-scripts / 002lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed1,012ms36,2845.2%
use-skill-with-scripts / 003lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed2,722ms36,2985.3%
use-skill-with-scripts / 004lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed2,311ms36,2865.2%
use-skill-with-scripts / 005lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed2,360ms36,2515.2%
use-skill-with-scripts / 006lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed2,163ms47,9425.2%
use-skill-with-scripts / 007lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed2,681ms36,2875.3%
use-skill-with-scripts / 008lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed2,484ms36,2715.2%
use-skill-with-scripts / 009lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed2,848ms36,3085.3%
use-skill-with-scripts / 010lm-studio / granite-4.1-8bBasic SkillsBaseline (/skills)failed1,435ms36,3075.3%
use-skill-with-scripts / 001lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed2,668ms12,9234.8%
use-skill-with-scripts / 002lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,288ms12,7354.3%
use-skill-with-scripts / 003lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,760ms12,8404.5%
use-skill-with-scripts / 004lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,862ms12,8434.6%
use-skill-with-scripts / 005lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed2,778ms13,0855.0%
use-skill-with-scripts / 006lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed2,653ms13,0324.9%
use-skill-with-scripts / 007lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,933ms12,8634.6%
use-skill-with-scripts / 008lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed2,921ms13,0085.1%
use-skill-with-scripts / 009lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed2,313ms12,9164.8%
use-skill-with-scripts / 010lm-studio / google/gemma-4-e2bBasic SkillsBaseline (/skills)failed1,850ms12,8394.6%
use-skill-with-scripts / 001lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed8,585ms48,8793.3%
use-skill-with-scripts / 002lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)failed25,019ms822,8536.0%
use-skill-with-scripts / 003lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed13,123ms513,1034.0%
use-skill-with-scripts / 004lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed9,657ms614,3663.8%
use-skill-with-scripts / 005lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed6,038ms36,6202.9%
use-skill-with-scripts / 006lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)failed23,233ms411,8975.5%
use-skill-with-scripts / 007lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed11,184ms613,7083.8%
use-skill-with-scripts / 008lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed5,981ms36,6712.9%
use-skill-with-scripts / 009lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed13,454ms513,1324.2%
use-skill-with-scripts / 010lm-studio / google/gemma-4-e4bBasic SkillsBaseline (/skills)passed7,175ms510,3143.1%
use-skill-with-scripts / 001lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed12,112ms36,9795.7%
use-skill-with-scripts / 002lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed13,989ms49,0346.0%
use-skill-with-scripts / 003lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed11,542ms37,0395.7%
use-skill-with-scripts / 004lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed12,473ms37,0575.8%
use-skill-with-scripts / 005lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed11,634ms37,0575.8%
use-skill-with-scripts / 006lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed12,524ms37,0655.8%
use-skill-with-scripts / 007lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed11,748ms37,0125.7%
use-skill-with-scripts / 008lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed11,056ms37,0225.7%
use-skill-with-scripts / 009lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed10,844ms37,0245.7%
use-skill-with-scripts / 010lm-studio / qwen/qwen3.6-35b-a3bBasic SkillsBaseline (/skills)passed11,057ms37,0625.7%
use-skill-with-scripts / 001lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed14,763ms511,3196.5%
use-skill-with-scripts / 002lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed13,114ms59,5696.4%
use-skill-with-scripts / 003lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed20,223ms716,1997.2%
use-skill-with-scripts / 004lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed10,893ms37,2095.9%
use-skill-with-scripts / 005lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed12,794ms49,2156.2%
use-skill-with-scripts / 006lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed11,070ms37,0855.8%
use-skill-with-scripts / 007lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed14,124ms37,0875.8%
use-skill-with-scripts / 008lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed15,098ms37,0935.8%
use-skill-with-scripts / 009lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed14,012ms37,0575.8%
use-skill-with-scripts / 010lm-studio / qwen/qwen3.5-9bBasic SkillsBaseline (/skills)passed15,047ms37,0435.8%