Test Cases
So you’ve written your gists, and used them to automate tasks. But when you’re changing them, how do you know if they’re getting better and not worse?
Gists is the first platform that makes it easy to create test cases for your gists, so you can ensure consistency when you are changing prompts.
This can cut your prompt-engineering time by over 52%.
How it works
A test case consists of values for all the variables used in a gist, as well as an expected output as the reference.
For example, if you have the following gist:
We could add the following test cases:
Running test cases
By defining test cases for your gists, we can now measure the consistency of gists by running them multiple times and calculate the success rates.
To calculate the success rates, however, we have to first define how the outputs are evaluated for success and failure.
Let’s take a look at the default evaluators.