Evaluation and benchmarking are crucial concepts in LLM development. To improve the performance of an LLM app (RAG, agents), you must have a way to measure it.
LlamaIndex offers key modules to measure the quality of generated results. We also offer key modules to measure retrieval quality.
Response Evaluation: Does the response match the retrieved context? Does it also match the query? Does it match the reference answer or guidelnes?
Retrieval Evaluation: Are the retrieved sources relevant to the query?
You can learn more about how evaluation works in LlamaIndex in our module guides.