Case
use-skill-with-scripts
Success rate 46.7% across all recorded iterations.
- Total
- 60
- Passed
- 28
- Failed
- 32
- Errors
- 0
Model Comparison
| # | Model | Score | Basic File Reading | Basic Skills | Avg turns | Avg tools | Avg tokens | Context avg / max |
|---|---|---|---|---|---|---|---|---|
| 1 | qwen/qwen3.5-9b | 100.0% | n/a | 100.0% | 4.8 | 3.9 | 8,888 | 6.1% / 7.2% |
| 2 | qwen/qwen3.6-35b-a3b | 100.0% | n/a | 100.0% | 4.1 | 3.1 | 7,235 | 5.8% / 6.0% |
| 3 | google/gemma-4-e4b | 80.0% | n/a | 80.0% | 5.9 | 4.9 | 12,154 | 3.9% / 6.0% |
| 4 | google/gemma-4-e2b | 0.0% | n/a | 0.0% | 2.0 | 1.0 | 2,908 | 4.7% / 5.1% |
| 5 | granite-4.1-8b | 0.0% | n/a | 0.0% | 4.1 | 3.1 | 6,455 | 5.3% / 5.3% |
| 6 | lfm2.5-350m | 0.0% | n/a | 0.0% | 1.3 | 0.3 | 1,883 | 4.4% / 4.8% |
Iterations
60 matching iterations
| Iteration | Model | Category | Variant | Status | Duration | Tools | Tokens | Context |
|---|---|---|---|---|---|---|---|---|
| use-skill-with-scripts / 001 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 361ms | 1 | 2,850 | 4.5% |
| use-skill-with-scripts / 002 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 294ms | 0 | 1,447 | 4.4% |
| use-skill-with-scripts / 003 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 291ms | 0 | 1,452 | 4.4% |
| use-skill-with-scripts / 004 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 224ms | 0 | 1,420 | 4.3% |
| use-skill-with-scripts / 005 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 294ms | 0 | 1,447 | 4.4% |
| use-skill-with-scripts / 006 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 241ms | 0 | 1,423 | 4.3% |
| use-skill-with-scripts / 007 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 340ms | 0 | 1,468 | 4.5% |
| use-skill-with-scripts / 008 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 207ms | 0 | 1,419 | 4.3% |
| use-skill-with-scripts / 009 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 487ms | 2 | 4,500 | 4.8% |
| use-skill-with-scripts / 010 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 199ms | 0 | 1,408 | 4.3% |
| use-skill-with-scripts / 001 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 3,072ms | 3 | 6,320 | 5.3% |
| use-skill-with-scripts / 002 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 1,012ms | 3 | 6,284 | 5.2% |
| use-skill-with-scripts / 003 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 2,722ms | 3 | 6,298 | 5.3% |
| use-skill-with-scripts / 004 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 2,311ms | 3 | 6,286 | 5.2% |
| use-skill-with-scripts / 005 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 2,360ms | 3 | 6,251 | 5.2% |
| use-skill-with-scripts / 006 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 2,163ms | 4 | 7,942 | 5.2% |
| use-skill-with-scripts / 007 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 2,681ms | 3 | 6,287 | 5.3% |
| use-skill-with-scripts / 008 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 2,484ms | 3 | 6,271 | 5.2% |
| use-skill-with-scripts / 009 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 2,848ms | 3 | 6,308 | 5.3% |
| use-skill-with-scripts / 010 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | failed | 1,435ms | 3 | 6,307 | 5.3% |
| use-skill-with-scripts / 001 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,668ms | 1 | 2,923 | 4.8% |
| use-skill-with-scripts / 002 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,288ms | 1 | 2,735 | 4.3% |
| use-skill-with-scripts / 003 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,760ms | 1 | 2,840 | 4.5% |
| use-skill-with-scripts / 004 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,862ms | 1 | 2,843 | 4.6% |
| use-skill-with-scripts / 005 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,778ms | 1 | 3,085 | 5.0% |
| use-skill-with-scripts / 006 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,653ms | 1 | 3,032 | 4.9% |
| use-skill-with-scripts / 007 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,933ms | 1 | 2,863 | 4.6% |
| use-skill-with-scripts / 008 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,921ms | 1 | 3,008 | 5.1% |
| use-skill-with-scripts / 009 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,313ms | 1 | 2,916 | 4.8% |
| use-skill-with-scripts / 010 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,850ms | 1 | 2,839 | 4.6% |
| use-skill-with-scripts / 001 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 8,585ms | 4 | 8,879 | 3.3% |
| use-skill-with-scripts / 002 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 25,019ms | 8 | 22,853 | 6.0% |
| use-skill-with-scripts / 003 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 13,123ms | 5 | 13,103 | 4.0% |
| use-skill-with-scripts / 004 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 9,657ms | 6 | 14,366 | 3.8% |
| use-skill-with-scripts / 005 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 6,038ms | 3 | 6,620 | 2.9% |
| use-skill-with-scripts / 006 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 23,233ms | 4 | 11,897 | 5.5% |
| use-skill-with-scripts / 007 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 11,184ms | 6 | 13,708 | 3.8% |
| use-skill-with-scripts / 008 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 5,981ms | 3 | 6,671 | 2.9% |
| use-skill-with-scripts / 009 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 13,454ms | 5 | 13,132 | 4.2% |
| use-skill-with-scripts / 010 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 7,175ms | 5 | 10,314 | 3.1% |
| use-skill-with-scripts / 001 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 12,112ms | 3 | 6,979 | 5.7% |
| use-skill-with-scripts / 002 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 13,989ms | 4 | 9,034 | 6.0% |
| use-skill-with-scripts / 003 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,542ms | 3 | 7,039 | 5.7% |
| use-skill-with-scripts / 004 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 12,473ms | 3 | 7,057 | 5.8% |
| use-skill-with-scripts / 005 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,634ms | 3 | 7,057 | 5.8% |
| use-skill-with-scripts / 006 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 12,524ms | 3 | 7,065 | 5.8% |
| use-skill-with-scripts / 007 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,748ms | 3 | 7,012 | 5.7% |
| use-skill-with-scripts / 008 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,056ms | 3 | 7,022 | 5.7% |
| use-skill-with-scripts / 009 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 10,844ms | 3 | 7,024 | 5.7% |
| use-skill-with-scripts / 010 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,057ms | 3 | 7,062 | 5.7% |
| use-skill-with-scripts / 001 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 14,763ms | 5 | 11,319 | 6.5% |
| use-skill-with-scripts / 002 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 13,114ms | 5 | 9,569 | 6.4% |
| use-skill-with-scripts / 003 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 20,223ms | 7 | 16,199 | 7.2% |
| use-skill-with-scripts / 004 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 10,893ms | 3 | 7,209 | 5.9% |
| use-skill-with-scripts / 005 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 12,794ms | 4 | 9,215 | 6.2% |
| use-skill-with-scripts / 006 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 11,070ms | 3 | 7,085 | 5.8% |
| use-skill-with-scripts / 007 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 14,124ms | 3 | 7,087 | 5.8% |
| use-skill-with-scripts / 008 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 15,098ms | 3 | 7,093 | 5.8% |
| use-skill-with-scripts / 009 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 14,012ms | 3 | 7,057 | 5.8% |
| use-skill-with-scripts / 010 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 15,047ms | 3 | 7,043 | 5.8% |