all runs
53 runs · 50 scored · 0 no perf · 3 fail · 0 err · sorted by peak_fraction desc
One row per (model, problem) cell. Click any row to open the full transcript viewer — every tool call, every reasoning step, the model's solution.py, the check.log, the result.json. The viewer is the same one we use locally to audit runs, just themed for the site.