- Note: We are showcasing original Prometheus model for the analysis here. You can re-run the analysis with quantized version of the model.
- Download Datasets
- Define Prometheus LLM hosted on HuggingFace.
-
Prompt templates.
- Define parser function
- Define Correctness, FaithFulness, Relevancy Evaluators
- Let's create a function to create and for different datasets.
- Function to run batch evaluations on defined evaluators
- Function to check the distribution of scores
- Function to check correctness, faithfulness and relevancy evaluation score
- Function to compute .
-
Evaluation on PaulGraham Essay text
- Compute Correctness, Faithfulness and Relevancy Evaluation
- Correctness Evaluation score distribution with Prometheus Evaluator.
- Correctness Evaluation score distribution with GPT-4 Evaluator.
-
Feedback comparision between prometheus and gpt-4.
- Prometheus Faithfulness and Relevancy Evaluation scores.
- GPT-4 Faithfulness and Relevancy Evaluation scores.
-
Hamming Distance comparison between Prometheus and GPT-4
- GPT-4 Cost analysis
-
Evaluation with Llama2 paper
- Compute Correctness, Faithfulness and Relevancy Evaluation
- Correctness Evaluation score distribution with Prometheus Evaluator.
- Correctness Evaluation score distribution with GPT-4 Evaluator.
-
Feedback comparison between prometheus and gpt-4 for correctness.
- Prometheus Faithfulness and Relevancy Evaluation scores.
- GPT-4 Faithfulness and Relevancy Evaluation scores.
-
Hamming Distance comparison between Prometheus and GPT-4
-
Feedback comparison between prometheus and gpt-4 for faithfulness and relevancy
- GPT-4 Cost analysis
-
Total Cost Analysis
- Observation: