ToolSolo
Sampler-Bench
Wanted to understand how sampling strategies affect generation quality. Built a benchmark to find out.
Highlights
- Compares temperature, top-p, top-k, min-p, top-n-sigma
- LLM-as-judge scoring with single or multi-judge modes
- Dashboard with leaderboards and score distributions
Tech Stack
PythonNext.jsTailwind v4