Case
use-skill
Success rate 55.0% across all recorded iterations.
- Total
- 60
- Passed
- 33
- Failed
- 27
- Errors
- 0
Model Comparison
| # | Model | Score | Basic File Reading | Basic Skills | Avg turns | Avg tools | Avg tokens | Context avg / max |
|---|---|---|---|---|---|---|---|---|
| 1 | qwen/qwen3.6-35b-a3b | 100.0% | n/a | 100.0% | 3.0 | 2.0 | 4,653 | 4.9% / 5.0% |
| 2 | granite-4.1-8b | 100.0% | n/a | 100.0% | 3.0 | 2.0 | 4,045 | 4.3% / 4.3% |
| 3 | qwen/qwen3.5-9b | 70.0% | n/a | 70.0% | 23.4 | 22.4 | 331,052 | 10.6% / 52.6% |
| 4 | google/gemma-4-e4b | 60.0% | n/a | 60.0% | 3.1 | 2.4 | 4,980 | 2.7% / 5.2% |
| 5 | google/gemma-4-e2b | 0.0% | n/a | 0.0% | 2.0 | 1.0 | 2,737 | 4.4% / 5.3% |
| 6 | lfm2.5-350m | 0.0% | n/a | 0.0% | 1.2 | 0.2 | 1,527 | 3.9% / 4.0% |
Iterations
60 matching iterations
| Iteration | Model | Category | Variant | Status | Duration | Tools | Tokens | Context |
|---|---|---|---|---|---|---|---|---|
| use-skill / 001 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 301ms | 0 | 1,299 | 4.0% |
| use-skill / 002 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 248ms | 0 | 1,278 | 3.9% |
| use-skill / 003 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 234ms | 1 | 2,522 | 3.9% |
| use-skill / 004 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 213ms | 0 | 1,265 | 3.9% |
| use-skill / 005 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 219ms | 0 | 1,267 | 3.9% |
| use-skill / 006 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 242ms | 0 | 1,278 | 3.9% |
| use-skill / 007 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 302ms | 1 | 2,555 | 4.0% |
| use-skill / 008 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 235ms | 0 | 1,270 | 3.9% |
| use-skill / 009 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 204ms | 0 | 1,258 | 3.8% |
| use-skill / 010 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 262ms | 0 | 1,276 | 3.9% |
| use-skill / 001 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 1,459ms | 2 | 4,045 | 4.3% |
| use-skill / 002 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 964ms | 2 | 4,045 | 4.3% |
| use-skill / 003 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 987ms | 2 | 4,045 | 4.3% |
| use-skill / 004 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 980ms | 2 | 4,045 | 4.3% |
| use-skill / 005 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 981ms | 2 | 4,045 | 4.3% |
| use-skill / 006 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 975ms | 2 | 4,045 | 4.3% |
| use-skill / 007 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 987ms | 2 | 4,045 | 4.3% |
| use-skill / 008 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 979ms | 2 | 4,045 | 4.3% |
| use-skill / 009 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 966ms | 2 | 4,045 | 4.3% |
| use-skill / 010 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 964ms | 2 | 4,045 | 4.3% |
| use-skill / 001 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 4,584ms | 1 | 3,037 | 5.3% |
| use-skill / 002 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,510ms | 1 | 2,754 | 4.5% |
| use-skill / 003 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,426ms | 1 | 2,588 | 4.0% |
| use-skill / 004 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,528ms | 1 | 2,738 | 4.5% |
| use-skill / 005 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 3,857ms | 1 | 2,950 | 5.1% |
| use-skill / 006 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 472ms | 1 | 2,700 | 4.2% |
| use-skill / 007 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,854ms | 1 | 2,712 | 4.2% |
| use-skill / 008 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,457ms | 1 | 2,602 | 4.0% |
| use-skill / 009 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,618ms | 1 | 2,652 | 4.1% |
| use-skill / 010 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,592ms | 1 | 2,639 | 4.1% |
| use-skill / 001 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 2,252ms | 2 | 3,832 | 2.1% |
| use-skill / 002 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 9,076ms | 3 | 6,659 | 3.2% |
| use-skill / 003 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 2,593ms | 2 | 3,909 | 2.2% |
| use-skill / 004 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 2,865ms | 2 | 4,017 | 2.2% |
| use-skill / 005 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 14,226ms | 3 | 7,366 | 3.9% |
| use-skill / 006 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 2,340ms | 1 | 2,562 | 2.1% |
| use-skill / 007 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 2,559ms | 2 | 3,914 | 2.2% |
| use-skill / 008 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 26,552ms | 5 | 9,485 | 5.2% |
| use-skill / 009 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 2,375ms | 2 | 3,886 | 2.1% |
| use-skill / 010 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 2,734ms | 2 | 4,167 | 2.4% |
| use-skill / 001 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 8,971ms | 2 | 4,658 | 5.0% |
| use-skill / 002 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 9,159ms | 2 | 4,671 | 5.0% |
| use-skill / 003 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 9,685ms | 2 | 4,662 | 5.0% |
| use-skill / 004 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 8,684ms | 2 | 4,682 | 5.0% |
| use-skill / 005 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 8,604ms | 2 | 4,646 | 4.9% |
| use-skill / 006 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 9,676ms | 2 | 4,637 | 4.9% |
| use-skill / 007 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 8,800ms | 2 | 4,628 | 4.9% |
| use-skill / 008 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 9,609ms | 2 | 4,648 | 4.9% |
| use-skill / 009 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 10,513ms | 2 | 4,627 | 4.9% |
| use-skill / 010 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 9,366ms | 2 | 4,670 | 5.0% |
| use-skill / 001 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 2,866ms | 2 | 4,724 | 5.0% |
| use-skill / 002 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | failed | 399,910ms | 190 | 3,229,064 | 52.6% |
| use-skill / 003 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 2,607ms | 2 | 4,725 | 5.0% |
| use-skill / 004 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 1,979ms | 2 | 4,674 | 5.0% |
| use-skill / 005 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | failed | 22,314ms | 12 | 31,396 | 10.9% |
| use-skill / 006 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 2,683ms | 2 | 4,789 | 5.1% |
| use-skill / 007 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | failed | 9,388ms | 8 | 17,111 | 7.2% |
| use-skill / 008 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 2,045ms | 2 | 4,673 | 5.0% |
| use-skill / 009 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 2,566ms | 2 | 4,671 | 5.0% |
| use-skill / 010 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 3,055ms | 2 | 4,696 | 5.0% |