ToolSolo

Sampler-Bench

Wanted to understand how sampling strategies affect generation quality. Built a benchmark to find out.

pfreq

Highlights

  • Compares temperature, top-p, top-k, min-p, top-n-sigma
  • LLM-as-judge scoring with single or multi-judge modes
  • Dashboard with leaderboards and score distributions

Tech Stack

PythonNext.jsTailwind v4