kernelbench.com: Agentic GPU Kernel Benchmark Results and Run Artifacts

Arledge, Elliot

Rankings

best left — click a column for cells and audits

KernelBench Megabest decode speedup vs optimized-PyTorch baseline over valid (correct + audited-clean) cells · H100 PCIe

15.50x 14.82x 5.62x 4.01x 1.75x 1.38x

KernelBench CUDAmean peak fraction of roofline over full deck (fails = 0) · RTX PRO 6000 (CUDA-only deck)

KernelBench Hardmean peak fraction of roofline over full deck (fails = 0) · H100 PCIe

The decks

pick a GPU board — frozen decks, public harnesses, traces on Hugging Face

Mega

Whole-block fused megakernels, graded on decode speedup over optimized PyTorch.

H100 RTX PRO 6000 B200

GitHub

Traces

CUDA

CUDA-only writing deck — Triton and kernel DSLs fail the language gate.

Six-op CUDA/Triton deck, roofline-graded, one unlimited agent session per cell.

H100 RTX PRO 6000 B200

GitHub

Traces

Multicoming soon

NVLink collectives rewritten as kernels on 8×H100 SXM, graded on busbw.

8×H100 SXM

GitHub

Performance vs compute

Does the win just cost more tokens? Output tokens = the compute each model chose to spend. Models with clean token telemetry.

on the efficiency frontier (most performance per token) dominated (spent more, delivered less)

Method

Roofline, not speedup

Scores ground in hardware ceilings; baseline quirks can't move them.

Real agent harnesses

Claude Code, Codex, Cursor, Kimi, OpenCode, Grok — the tools labs actually ship.

Public transcripts

Every run — tools, reasoning, diffs — on the run index and Hugging Face.

Judge-assisted audit

Reward hacks and rubric leaks get flagged, published, and linked per cell.

Cite this benchmark suite

Website Website repository Mega repository Hard repository Mega HF traces Hard HF traces

@misc{arledge2026kernelbenchcom,
  title        = {kernelbench.com: Agentic GPU Kernel Benchmark Results and Run Artifacts},
  author       = {Arledge, Elliot},
  year         = {2026},
  howpublished = {\url{https://kernelbench.com}},
  note         = {Website, benchmark results, transcript viewers, and citation index}
}

@misc{arledge2026hard,
  title        = {Hard: Agentic CUDA Kernel Result Suite},
  author       = {Arledge, Elliot},
  year         = {2026},
  howpublished = {\url{https://github.com/Infatoshi/kernelbench.com/tree/master/benchmarks/hard}},
  note         = {CUDA benchmark suite, harness, results, and annotations}
}

@misc{arledge2026mega,
  title        = {Mega: Agentic GPU Megakernel Result Suite},
  author       = {Arledge, Elliot},
  year         = {2026},
  howpublished = {\url{https://github.com/Infatoshi/kernelbench.com/tree/master/benchmarks/mega}},
  note         = {Megakernel benchmark suite, sandboxed harness, and result artifacts}
}

@misc{arledge2026hardtraces,
  title        = {KernelBench-Hard Agent Traces},
  author       = {Arledge, Elliot},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/Infatoshi/kernelbench-hard-traces}},
  note         = {Per-run agent transcripts: messages, tool calls, reasoning}
}

@misc{arledge2026megatraces,
  title        = {KernelBench-Mega Agent Traces},
  author       = {Arledge, Elliot},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/Infatoshi/kernelbench-mega-traces}},
  note         = {Per-run agent transcripts for the megakernel suite}
}