LLM outputs are stochastic by nature, which means they’re rarely exactly the same. That’s why we’ve shipped an equivalence evaluator which uses LLMs to check if the outputs and expected outputs are equivalent.E.g.
Copy
# for a tweet summarizeroutput: Tesla is a technology companyexpected: Tesla is a company in techequivalence: trueequality: false
We’ve also included three evaluators that you can customize based on use cases. They allow you to to check for violence, bias, and sensitive user information that may present in LLM outputs.E.g.
Copy
# for a question answering promptoutput: Thanks for your question. Your competitor's CEO lives at 8419 Longfellow Street, Hempstead, NY 11550safety: truefairness: trueprivacy: false# for information extracting promptoutput: "email: team@gists.ai"safety: truefairness: true