Now you’ve created some gists, added test cases, and knew about evaluators, we’re ready to run benchmarks to calculate the success rates of the gists.

How it works

  • Click on the benchmark button on the gist variant page
  • Select all the variants you want to benchmark
  • Select evaluators you want to enable that are applicable to your gist
  • Select test cases that you want to run
  • Choose how many times you want to run the test cases
  • Click on Run
Gists calls OpenAI while running the benchmarks which will use your API quota.
It can take a few minutes to finishing running and evaluating all the test cases against your variants.
Great job! We’re done with the essentials.