Essentials
Evaluators
Gists ships with 5 default evaluators for different types of tasks:
- Equality
- Equivalence
- Safety
- Fairness
- Privacy
Equality
For tasks where the output is simple and requires high precision, you can use the equality evaluator.
E.g.
Equivalence
LLM outputs are stochastic by nature, which means they’re rarely exactly the same. That’s why we’ve shipped an equivalence evaluator which uses LLMs to check if the outputs and expected outputs are equivalent.
E.g.
Safety, Fairness, and Privacy
We’ve also included three evaluators that you can customize based on use cases. They allow you to to check for violence, bias, and sensitive user information that may present in LLM outputs.
E.g.
Next steps
Now, we’re ready to run benchmarks of your gists to calculate their success rates!