DeepEval v0.14 Update
To those new to DeepEval, DeepEval provides a Pythonic way to run offline evaluations on your LLM pipelines so you can launch comfortably into production. It provides a testing suite for LLMs.
In this product update, we include a number of improvements such as:
- Synthetic Data Creation Using LLMs
- Bulk Review For Synthetic Data Creation
- Custom Metric Logging
- Improved Developer Experience + CLI Improvements
Let’s get started:
For Retrieval Augmented Generation applications for tools like LlamaIndex, developers want an easy way to quickly measure the performance of their RAG pipeline.
This is now achievable in just 1 line of code.
dataset = create_evaluation_query_answer_pairs(
openai_api_key="sk-xxx",
context="FastAPI is a Python language.",
n=3
)
Under the hood, it uses ChatGPT to automatically create n
number of query-answer pairs. It uses a simple ChatGPT prompt, takes in the original context
and feeds it into a LLMTestCase
. The LLMTestCase
abstraction is one of the building blocks of DeepEval that allows for measuring performance of these RAG pipelines.
Interested in finding out more? Read about how to run this here.
Once you have created synthetic data, you can easily add / remove synthetic data pieces. You can see a sample screenshot of the dashboard for reviewing synthetic data.
The best part? You can view the dashboard completely in Python and can be self-hosted. This is done simply by running:
dataset.review()
When reviewing the dataset, you will be able to easily delete a row and add a row depending on what data you think is important for your evaluation.
Custom metric logging has be