Skip to main content
Now you’ve created some gists, added test cases, and knew about evaluators, we’re ready to run benchmarks to calculate the success rates of the gists.

How it works

  • Click on the benchmark button on the gist variant page
  • Select all the variants you want to benchmark
  • Select evaluators you want to enable that are applicable to your gist
  • Select test cases that you want to run
  • Choose how many times you want to run the test cases
  • Click on Run
Gists calls OpenAI while running the benchmarks which will use your API quota.
It can take a few minutes to finishing running and evaluating all the test cases against your variants.
Great job! We’re done with the essentials.