Case
read-file
Success rate 63.3% across all recorded iterations.
- Total
- 60
- Passed
- 38
- Failed
- 22
- Errors
- 0
Model Comparison
| # | Model | Score | Basic File Reading | Basic Skills | Avg turns | Avg tools | Avg tokens | Context avg / max |
|---|---|---|---|---|---|---|---|---|
| 1 | qwen/qwen3.5-9b | 100.0% | 100.0% | n/a | 4.9 | 3.9 | 7,502 | 5.2% / 5.4% |
| 2 | qwen/qwen3.6-35b-a3b | 100.0% | 100.0% | n/a | 5.4 | 4.4 | 8,274 | 5.2% / 5.4% |
| 3 | granite-4.1-8b | 100.0% | 100.0% | n/a | 4.5 | 3.5 | 5,786 | 4.2% / 4.3% |
| 4 | google/gemma-4-e4b | 80.0% | 80.0% | n/a | 5.0 | 4.0 | 7,917 | 2.9% / 4.1% |
| 5 | google/gemma-4-e2b | 0.0% | 0.0% | n/a | 2.9 | 1.9 | 4,275 | 5.3% / 6.7% |
| 6 | lfm2.5-350m | 0.0% | 0.0% | n/a | 2.0 | 1.0 | 2,339 | 3.7% / 3.7% |
Iterations
60 matching iterations
| Iteration | Model | Category | Variant | Status | Duration | Tools | Tokens | Context |
|---|---|---|---|---|---|---|---|---|
| read-file / 001 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 237ms | 1 | 2,332 | 3.7% |
| read-file / 002 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 254ms | 1 | 2,344 | 3.7% |
| read-file / 003 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 215ms | 1 | 2,316 | 3.6% |
| read-file / 004 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 240ms | 1 | 2,336 | 3.7% |
| read-file / 005 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 285ms | 1 | 2,352 | 3.7% |
| read-file / 006 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 296ms | 1 | 2,362 | 3.7% |
| read-file / 007 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 198ms | 1 | 2,324 | 3.6% |
| read-file / 008 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 234ms | 1 | 2,338 | 3.7% |
| read-file / 009 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 229ms | 1 | 2,333 | 3.7% |
| read-file / 010 | lm-studio / lfm2.5-350m | Basic File Reading | Baseline (/skills) | failed | 257ms | 1 | 2,351 | 3.7% |
| read-file / 001 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,675ms | 3 | 5,089 | 4.1% |
| read-file / 002 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,886ms | 4 | 6,489 | 4.2% |
| read-file / 003 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,917ms | 4 | 6,488 | 4.2% |
| read-file / 004 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,930ms | 4 | 6,492 | 4.3% |
| read-file / 005 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,586ms | 3 | 5,076 | 4.1% |
| read-file / 006 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,932ms | 4 | 6,487 | 4.2% |
| read-file / 007 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,565ms | 3 | 5,082 | 4.1% |
| read-file / 008 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,593ms | 3 | 5,084 | 4.1% |
| read-file / 009 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,911ms | 4 | 6,486 | 4.2% |
| read-file / 010 | lm-studio / granite-4.1-8b | Basic File Reading | Baseline (/skills) | passed | 1,462ms | 3 | 5,082 | 4.1% |
| read-file / 001 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 6,079ms | 2 | 4,764 | 5.8% |
| read-file / 002 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 7,964ms | 2 | 5,323 | 6.7% |
| read-file / 003 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 2,512ms | 2 | 3,697 | 4.3% |
| read-file / 004 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 997ms | 2 | 3,672 | 4.2% |
| read-file / 005 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 5,621ms | 2 | 4,654 | 5.7% |
| read-file / 006 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 6,011ms | 2 | 4,748 | 5.8% |
| read-file / 007 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 2,598ms | 2 | 3,754 | 4.3% |
| read-file / 008 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 2,804ms | 1 | 2,565 | 4.4% |
| read-file / 009 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 5,794ms | 2 | 4,888 | 5.8% |
| read-file / 010 | lm-studio / google/gemma-4-e2b | Basic File Reading | Baseline (/skills) | failed | 4,436ms | 2 | 4,683 | 5.7% |
| read-file / 001 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 5,916ms | 4 | 7,392 | 2.7% |
| read-file / 002 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 6,355ms | 4 | 7,010 | 2.6% |
| read-file / 003 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 9,283ms | 4 | 8,147 | 3.0% |
| read-file / 004 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 5,922ms | 4 | 6,915 | 2.5% |
| read-file / 005 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 8,120ms | 4 | 7,574 | 2.8% |
| read-file / 006 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | failed | 10,726ms | 3 | 6,906 | 3.2% |
| read-file / 007 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 7,304ms | 4 | 7,322 | 2.7% |
| read-file / 008 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 6,488ms | 4 | 7,411 | 2.8% |
| read-file / 009 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | passed | 16,214ms | 6 | 14,273 | 4.1% |
| read-file / 010 | lm-studio / google/gemma-4-e4b | Basic File Reading | Baseline (/skills) | failed | 7,638ms | 3 | 6,215 | 2.9% |
| read-file / 001 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 22,082ms | 5 | 9,314 | 5.3% |
| read-file / 002 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 18,410ms | 4 | 7,568 | 5.1% |
| read-file / 003 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 25,388ms | 5 | 9,263 | 5.3% |
| read-file / 004 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 23,095ms | 4 | 7,615 | 5.2% |
| read-file / 005 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 21,509ms | 5 | 9,335 | 5.4% |
| read-file / 006 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 14,692ms | 3 | 5,902 | 4.9% |
| read-file / 007 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 21,652ms | 5 | 9,258 | 5.3% |
| read-file / 008 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 18,095ms | 4 | 7,614 | 5.2% |
| read-file / 009 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 17,077ms | 4 | 7,617 | 5.2% |
| read-file / 010 | lm-studio / qwen/qwen3.6-35b-a3b | Basic File Reading | Baseline (/skills) | passed | 20,970ms | 5 | 9,251 | 5.3% |
| read-file / 001 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 17,162ms | 4 | 7,704 | 5.3% |
| read-file / 002 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 14,207ms | 4 | 7,642 | 5.2% |
| read-file / 003 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 5,792ms | 4 | 7,651 | 5.1% |
| read-file / 004 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 3,813ms | 4 | 7,634 | 5.2% |
| read-file / 005 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 3,944ms | 4 | 7,618 | 5.2% |
| read-file / 006 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 3,382ms | 4 | 7,568 | 5.1% |
| read-file / 007 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 4,640ms | 4 | 7,920 | 5.4% |
| read-file / 008 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 2,947ms | 3 | 5,931 | 4.9% |
| read-file / 009 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 4,398ms | 4 | 7,720 | 5.3% |
| read-file / 010 | lm-studio / qwen/qwen3.5-9b | Basic File Reading | Baseline (/skills) | passed | 4,175ms | 4 | 7,627 | 5.2% |