Case
read-exact-file
Success rate 83.3% across all recorded iterations.
- Total
- 60
- Passed
- 50
- Failed
- 10
- Errors
- 0
Model Comparison
| # | Model | Score | Basic File Reading | Basic Skills | Avg turns | Avg tools | Avg tokens | Context avg / max |
|---|---|---|---|---|---|---|---|---|
| 1 | qwen/qwen3.5-9b | 100.0% | 100.0% | n/a | 3.3 | 2.3 | 5,182 | 5.1% / 5.5% |
| 2 | qwen/qwen3.6-35b-a3b | 100.0% | 100.0% | n/a | 3.0 | 2.0 | 4,637 | 5.0% / 5.1% |
| 3 | google/gemma-4-e4b | 100.0% | 100.0% | n/a | 3.1 | 2.1 | 4,072 | 2.2% / 2.3% |
| 4 | google/gemma-4-e2b | 100.0% | 100.0% | n/a | 3.0 | 2.0 | 3,926 | 4.2% / 4.4% |
| 5 | granite-4.1-8b | 100.0% | 100.0% | n/a | 3.0 | 2.0 | 4,081 | 4.3% / 4.3% |
| 6 | lfm2.5-350m | 0.0% | 0.0% | n/a | 2.1 | 1.1 | 2,721 | 4.0% / 4.2% |
Iterations
60 matching iterations
| Iteration | Model | Category | Variant | Status | Duration | Tools | Tokens | Context |
|---|---|---|---|---|---|---|---|---|
| read-exact-file / 001 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 686ms | 1 | 2,603 | 4.1% |
| read-exact-file / 002 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 262ms | 1 | 2,591 | 4.0% |
| read-exact-file / 003 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 262ms | 1 | 2,581 | 4.0% |
| read-exact-file / 004 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 247ms | 1 | 2,587 | 4.0% |
| read-exact-file / 005 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | -1,025ms | 1 | 2,588 | 4.0% |
| read-exact-file / 006 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 339ms | 2 | 3,923 | 4.2% |
| read-exact-file / 007 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 243ms | 1 | 2,587 | 4.0% |
| read-exact-file / 008 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 258ms | 1 | 2,588 | 4.0% |
| read-exact-file / 009 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 242ms | 1 | 2,590 | 4.0% |
| read-exact-file / 010 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 209ms | 1 | 2,570 | 4.0% |
| read-exact-file / 001 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 500ms | 2 | 4,083 | 4.3% |
| read-exact-file / 002 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,255ms | 2 | 4,085 | 4.3% |
| read-exact-file / 003 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,102ms | 2 | 4,076 | 4.3% |
| read-exact-file / 004 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,056ms | 2 | 4,076 | 4.3% |
| read-exact-file / 005 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,255ms | 2 | 4,086 | 4.3% |
| read-exact-file / 006 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,367ms | 2 | 4,096 | 4.3% |
| read-exact-file / 007 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,145ms | 2 | 4,081 | 4.3% |
| read-exact-file / 008 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,080ms | 2 | 4,076 | 4.3% |
| read-exact-file / 009 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,078ms | 2 | 4,076 | 4.3% |
| read-exact-file / 010 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,079ms | 2 | 4,076 | 4.3% |
| read-exact-file / 001 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,565ms | 2 | 3,883 | 4.0% |
| read-exact-file / 002 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,720ms | 2 | 3,921 | 4.2% |
| read-exact-file / 003 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,798ms | 2 | 3,945 | 4.3% |
| read-exact-file / 004 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,695ms | 2 | 3,919 | 4.2% |
| read-exact-file / 005 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,658ms | 2 | 3,881 | 4.2% |
| read-exact-file / 006 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 297ms | 2 | 3,984 | 4.2% |
| read-exact-file / 007 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,176ms | 2 | 3,829 | 4.0% |
| read-exact-file / 008 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,913ms | 2 | 3,981 | 4.3% |
| read-exact-file / 009 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 2,155ms | 2 | 4,047 | 4.4% |
| read-exact-file / 010 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | passed | 1,215ms | 2 | 3,868 | 4.0% |
| read-exact-file / 001 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 3,021ms | 2 | 3,877 | 2.1% |
| read-exact-file / 002 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 3,071ms | 2 | 3,960 | 2.2% |
| read-exact-file / 003 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 3,075ms | 2 | 3,990 | 2.2% |
| read-exact-file / 004 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 3,076ms | 3 | 5,340 | 2.2% |
| read-exact-file / 005 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 3,690ms | 2 | 3,995 | 2.3% |
| read-exact-file / 006 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 1,230ms | 2 | 3,868 | 2.1% |
| read-exact-file / 007 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 2,715ms | 2 | 3,915 | 2.2% |
| read-exact-file / 008 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 2,560ms | 2 | 3,870 | 2.1% |
| read-exact-file / 009 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 3,248ms | 2 | 3,960 | 2.2% |
| read-exact-file / 010 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 3,205ms | 2 | 3,940 | 2.2% |
| read-exact-file / 001 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 7,348ms | 2 | 4,598 | 4.9% |
| read-exact-file / 002 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 11,878ms | 2 | 4,603 | 4.9% |
| read-exact-file / 003 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 13,866ms | 2 | 4,650 | 5.0% |
| read-exact-file / 004 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 13,656ms | 2 | 4,625 | 4.9% |
| read-exact-file / 005 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 14,786ms | 2 | 4,613 | 4.9% |
| read-exact-file / 006 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 13,743ms | 2 | 4,613 | 4.9% |
| read-exact-file / 007 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 14,205ms | 2 | 4,654 | 5.0% |
| read-exact-file / 008 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 13,797ms | 2 | 4,669 | 5.0% |
| read-exact-file / 009 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 13,748ms | 2 | 4,656 | 5.0% |
| read-exact-file / 010 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 12,567ms | 2 | 4,690 | 5.1% |
| read-exact-file / 001 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,850ms | 2 | 4,663 | 5.0% |
| read-exact-file / 002 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 3,613ms | 4 | 8,261 | 5.5% |
| read-exact-file / 003 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 1,973ms | 2 | 4,620 | 4.9% |
| read-exact-file / 004 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,994ms | 2 | 4,631 | 5.0% |
| read-exact-file / 005 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,280ms | 2 | 4,651 | 5.0% |
| read-exact-file / 006 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,217ms | 2 | 4,657 | 5.0% |
| read-exact-file / 007 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,202ms | 2 | 4,646 | 5.0% |
| read-exact-file / 008 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,356ms | 2 | 4,649 | 5.0% |
| read-exact-file / 009 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 3,058ms | 3 | 6,409 | 5.3% |
| read-exact-file / 010 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,136ms | 2 | 4,630 | 4.9% |