Case

read-file

Success rate 63.3% across all recorded iterations.

Total
60
Passed
38
Failed
22
Errors
0

Model Comparison

#ModelScoreBasic File ReadingBasic SkillsAvg turnsAvg toolsAvg tokensContext avg / max
1qwen/qwen3.5-9b100.0%100.0%n/a4.93.97,5025.2% / 5.4%
2qwen/qwen3.6-35b-a3b100.0%100.0%n/a5.44.48,2745.2% / 5.4%
3granite-4.1-8b100.0%100.0%n/a4.53.55,7864.2% / 4.3%
4google/gemma-4-e4b80.0%80.0%n/a5.04.07,9172.9% / 4.1%
5google/gemma-4-e2b0.0%0.0%n/a2.91.94,2755.3% / 6.7%
6lfm2.5-350m0.0%0.0%n/a2.01.02,3393.7% / 3.7%

Iterations

60 matching iterations

IterationModelCategoryVariantStatusDurationToolsTokensContext
read-file / 001lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed237ms12,3323.7%
read-file / 002lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed254ms12,3443.7%
read-file / 003lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed215ms12,3163.6%
read-file / 004lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed240ms12,3363.7%
read-file / 005lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed285ms12,3523.7%
read-file / 006lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed296ms12,3623.7%
read-file / 007lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed198ms12,3243.6%
read-file / 008lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed234ms12,3383.7%
read-file / 009lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed229ms12,3333.7%
read-file / 010lm-studio / lfm2.5-350mBasic File ReadingBaseline (/skills)failed257ms12,3513.7%
read-file / 001lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,675ms35,0894.1%
read-file / 002lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,886ms46,4894.2%
read-file / 003lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,917ms46,4884.2%
read-file / 004lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,930ms46,4924.3%
read-file / 005lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,586ms35,0764.1%
read-file / 006lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,932ms46,4874.2%
read-file / 007lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,565ms35,0824.1%
read-file / 008lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,593ms35,0844.1%
read-file / 009lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,911ms46,4864.2%
read-file / 010lm-studio / granite-4.1-8bBasic File ReadingBaseline (/skills)passed1,462ms35,0824.1%
read-file / 001lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed6,079ms24,7645.8%
read-file / 002lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed7,964ms25,3236.7%
read-file / 003lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed2,512ms23,6974.3%
read-file / 004lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed997ms23,6724.2%
read-file / 005lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed5,621ms24,6545.7%
read-file / 006lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed6,011ms24,7485.8%
read-file / 007lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed2,598ms23,7544.3%
read-file / 008lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed2,804ms12,5654.4%
read-file / 009lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed5,794ms24,8885.8%
read-file / 010lm-studio / google/gemma-4-e2bBasic File ReadingBaseline (/skills)failed4,436ms24,6835.7%
read-file / 001lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed5,916ms47,3922.7%
read-file / 002lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed6,355ms47,0102.6%
read-file / 003lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed9,283ms48,1473.0%
read-file / 004lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed5,922ms46,9152.5%
read-file / 005lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed8,120ms47,5742.8%
read-file / 006lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)failed10,726ms36,9063.2%
read-file / 007lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed7,304ms47,3222.7%
read-file / 008lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed6,488ms47,4112.8%
read-file / 009lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)passed16,214ms614,2734.1%
read-file / 010lm-studio / google/gemma-4-e4bBasic File ReadingBaseline (/skills)failed7,638ms36,2152.9%
read-file / 001lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed22,082ms59,3145.3%
read-file / 002lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed18,410ms47,5685.1%
read-file / 003lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed25,388ms59,2635.3%
read-file / 004lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed23,095ms47,6155.2%
read-file / 005lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed21,509ms59,3355.4%
read-file / 006lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed14,692ms35,9024.9%
read-file / 007lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed21,652ms59,2585.3%
read-file / 008lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed18,095ms47,6145.2%
read-file / 009lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed17,077ms47,6175.2%
read-file / 010lm-studio / qwen/qwen3.6-35b-a3bBasic File ReadingBaseline (/skills)passed20,970ms59,2515.3%
read-file / 001lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed17,162ms47,7045.3%
read-file / 002lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed14,207ms47,6425.2%
read-file / 003lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed5,792ms47,6515.1%
read-file / 004lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed3,813ms47,6345.2%
read-file / 005lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed3,944ms47,6185.2%
read-file / 006lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed3,382ms47,5685.1%
read-file / 007lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed4,640ms47,9205.4%
read-file / 008lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed2,947ms35,9314.9%
read-file / 009lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed4,398ms47,7205.3%
read-file / 010lm-studio / qwen/qwen3.5-9bBasic File ReadingBaseline (/skills)passed4,175ms47,6275.2%