Case
use-skill-with-refs
Success rate 55.0% across all recorded iterations.
- Total
- 60
- Passed
- 33
- Failed
- 27
- Errors
- 0
Model Comparison
| # | Model | Score | Basic File Reading | Basic Skills | Avg turns | Avg tools | Avg tokens | Context avg / max |
|---|---|---|---|---|---|---|---|---|
| 1 | qwen/qwen3.6-35b-a3b | 100.0% | n/a | 100.0% | 4.0 | 3.0 | 6,605 | 5.4% / 5.5% |
| 2 | granite-4.1-8b | 100.0% | n/a | 100.0% | 5.0 | 4.0 | 7,235 | 4.7% / 4.8% |
| 3 | qwen/qwen3.5-9b | 80.0% | n/a | 80.0% | 8.5 | 7.6 | 18,343 | 7.2% / 13.6% |
| 4 | google/gemma-4-e4b | 50.0% | n/a | 50.0% | 4.5 | 3.5 | 8,048 | 3.4% / 5.0% |
| 5 | google/gemma-4-e2b | 0.0% | n/a | 0.0% | 2.0 | 1.0 | 2,806 | 4.6% / 7.2% |
| 6 | lfm2.5-350m | 0.0% | n/a | 0.0% | 1.0 | 0.0 | 1,284 | 3.9% / 4.0% |
Iterations
60 matching iterations
| Iteration | Model | Category | Variant | Status | Duration | Tools | Tokens | Context |
|---|---|---|---|---|---|---|---|---|
| use-skill-with-refs / 001 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 311ms | 0 | 1,295 | 4.0% |
| use-skill-with-refs / 002 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 230ms | 0 | 1,287 | 3.9% |
| use-skill-with-refs / 003 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 202ms | 0 | 1,275 | 3.9% |
| use-skill-with-refs / 004 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 209ms | 0 | 1,285 | 3.9% |
| use-skill-with-refs / 005 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 269ms | 0 | 1,297 | 4.0% |
| use-skill-with-refs / 006 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 217ms | 0 | 1,279 | 3.9% |
| use-skill-with-refs / 007 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 203ms | 0 | 1,281 | 3.9% |
| use-skill-with-refs / 008 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 237ms | 0 | 1,292 | 3.9% |
| use-skill-with-refs / 009 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 196ms | 0 | 1,274 | 3.9% |
| use-skill-with-refs / 010 | lm-studio / lfm2.5-350m | Basic Skills | Baseline (/skills) | failed | 198ms | 0 | 1,276 | 3.9% |
| use-skill-with-refs / 001 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 2,373ms | 4 | 7,219 | 4.7% |
| use-skill-with-refs / 002 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 862ms | 4 | 7,263 | 4.8% |
| use-skill-with-refs / 003 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 1,940ms | 4 | 7,225 | 4.7% |
| use-skill-with-refs / 004 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 1,870ms | 4 | 7,219 | 4.7% |
| use-skill-with-refs / 005 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 1,952ms | 4 | 7,226 | 4.7% |
| use-skill-with-refs / 006 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 2,061ms | 4 | 7,246 | 4.7% |
| use-skill-with-refs / 007 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 2,025ms | 4 | 7,248 | 4.7% |
| use-skill-with-refs / 008 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 2,127ms | 4 | 7,250 | 4.7% |
| use-skill-with-refs / 009 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 1,917ms | 4 | 7,224 | 4.7% |
| use-skill-with-refs / 010 | lm-studio / granite-4.1-8b | Basic Skills | Baseline (/skills) | passed | 2,019ms | 4 | 7,227 | 4.7% |
| use-skill-with-refs / 001 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,910ms | 1 | 2,699 | 4.2% |
| use-skill-with-refs / 002 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 3,106ms | 1 | 2,881 | 4.8% |
| use-skill-with-refs / 003 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,661ms | 1 | 2,680 | 4.2% |
| use-skill-with-refs / 004 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 2,736ms | 1 | 2,799 | 4.6% |
| use-skill-with-refs / 005 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,769ms | 1 | 2,647 | 4.2% |
| use-skill-with-refs / 006 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,271ms | 1 | 2,747 | 4.6% |
| use-skill-with-refs / 007 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,963ms | 1 | 2,722 | 4.3% |
| use-skill-with-refs / 008 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 8,449ms | 1 | 3,604 | 7.2% |
| use-skill-with-refs / 009 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,754ms | 1 | 2,682 | 4.1% |
| use-skill-with-refs / 010 | lm-studio / google/gemma-4-e2b | Basic Skills | Baseline (/skills) | failed | 1,723ms | 1 | 2,598 | 4.2% |
| use-skill-with-refs / 001 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 10,671ms | 4 | 8,627 | 3.4% |
| use-skill-with-refs / 002 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 4,508ms | 2 | 4,288 | 2.5% |
| use-skill-with-refs / 003 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 15,465ms | 5 | 12,722 | 4.3% |
| use-skill-with-refs / 004 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 8,456ms | 3 | 6,848 | 3.1% |
| use-skill-with-refs / 005 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 6,872ms | 3 | 6,474 | 2.9% |
| use-skill-with-refs / 006 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 9,465ms | 4 | 9,383 | 3.5% |
| use-skill-with-refs / 007 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 20,344ms | 4 | 10,771 | 5.0% |
| use-skill-with-refs / 008 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 5,010ms | 3 | 5,884 | 2.6% |
| use-skill-with-refs / 009 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | passed | 9,695ms | 4 | 8,846 | 3.3% |
| use-skill-with-refs / 010 | lm-studio / google/gemma-4-e4b | Basic Skills | Baseline (/skills) | failed | 10,204ms | 3 | 6,637 | 3.4% |
| use-skill-with-refs / 001 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 10,912ms | 3 | 6,543 | 5.4% |
| use-skill-with-refs / 002 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,674ms | 3 | 6,716 | 5.5% |
| use-skill-with-refs / 003 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,752ms | 3 | 6,643 | 5.5% |
| use-skill-with-refs / 004 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 12,361ms | 3 | 6,577 | 5.4% |
| use-skill-with-refs / 005 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,400ms | 3 | 6,533 | 5.3% |
| use-skill-with-refs / 006 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,617ms | 3 | 6,547 | 5.3% |
| use-skill-with-refs / 007 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 12,661ms | 3 | 6,578 | 5.4% |
| use-skill-with-refs / 008 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 12,105ms | 3 | 6,673 | 5.5% |
| use-skill-with-refs / 009 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 12,323ms | 3 | 6,607 | 5.4% |
| use-skill-with-refs / 010 | lm-studio / qwen/qwen3.6-35b-a3b | Basic Skills | Baseline (/skills) | passed | 11,765ms | 3 | 6,637 | 5.5% |
| use-skill-with-refs / 001 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 11,022ms | 5 | 11,835 | 7.1% |
| use-skill-with-refs / 002 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 3,945ms | 3 | 6,924 | 5.7% |
| use-skill-with-refs / 003 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 3,319ms | 3 | 6,742 | 5.5% |
| use-skill-with-refs / 004 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 9,394ms | 8 | 17,069 | 7.1% |
| use-skill-with-refs / 005 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 13,156ms | 13 | 26,554 | 8.3% |
| use-skill-with-refs / 006 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | failed | 2,839ms | 2 | 4,826 | 5.2% |
| use-skill-with-refs / 007 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | failed | 6,726ms | 2 | 5,127 | 6.2% |
| use-skill-with-refs / 008 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 7,218ms | 5 | 11,407 | 6.6% |
| use-skill-with-refs / 009 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 31,931ms | 29 | 80,413 | 13.6% |
| use-skill-with-refs / 010 | lm-studio / qwen/qwen3.5-9b | Basic Skills | Baseline (/skills) | passed | 10,006ms | 6 | 12,536 | 6.5% |