Spaces:

AtlaAI
/

judge-arena

Running

App Files Files Community

kaikaidai commited on Nov 11, 2024

Commit

58f5f61

verified ·

1 Parent(s): dcdb545

UI changes 11 Nov

Browse files

Files changed (1) hide show

common.py +18 -25

common.py CHANGED Viewed

@@ -1,17 +1,19 @@
 # Page Headers
-MAIN_TITLE = "# Judge Arena - Free LLM Evals to test your GenAI application"
 # How it works section
 HOW_IT_WORKS = """
-- **Run any form of evaluation:** from simple hallucination detection to qualitative interpretations
-- **Evaluate anything:** coding, analysis, creative writing, math, or general knowledge
 """
 BATTLE_RULES = """
-## 🤺 Battle Rules:
-- Both AIs stay anonymous - if either reveals its identity, the duel is void
-- Choose the LLM judge that most aligns with your judgement
-- If both score the same - choose the critique that you prefer more!
 <br>
 """
@@ -35,34 +37,25 @@ CSS_STYLES = """
         gap: 8px;
     }
 """
 # Default Eval Prompt
 EVAL_DESCRIPTION = """
-## 📝 Instructions
-**Precise evaluation criteria lead to more consistent and reliable judgments.** A good Evaluator Prompt should include the following elements:
 - Evaluation criteria
 - Scoring rubric
-- (Optional) Examples\n
-**Any variables you define in your prompt using {{double curly braces}} will automatically map to the corresponding input fields under the "Sample to evaluate" section on the right.**
-<br><br>
 """
-DEFAULT_EVAL_PROMPT = """You are assessing a chat bot response to a user's input based on [INSERT CRITERIA]
 Score:
 A score of 1 means that the response's answer meets all of the evaluation criteria.
 A score of 0 means that the response's answer does not meet all of the evaluation criteria.
-Here is the data:
-[BEGIN DATA]
-***
 [User Query]: {{input}}
-***
-[Response]: {{response}}
-***
-[END DATA]"""
 # Default Variable Values
 DEFAULT_INPUT = """Which of these animals is least likely to be found in a rainforest?"
@@ -79,7 +72,7 @@ VOTING_HEADER = """
 # Acknowledgements
 ACKNOWLEDGEMENTS = """
-<br><br><br>
 # Acknowledgements
 We thank [LMSYS Org](https://lmsys.org/) for their hard work on the Chatbot Arena and fully credit them for the inspiration to build this.
@@ -152,4 +145,4 @@ Atla currently funds this out of our own pocket. We are looking for API credits
 We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
 <br><br>
 # Get in touch
-Feel free to email us at [[email protected]](mailto:[email protected]) or leave feedback on our [Github](https://github.com/atla-ai/judge-arena)!"""

 # Page Headers
+MAIN_TITLE = "# Judge Arena: Benchmarking LLMs as Evaluators"
 # How it works section
 HOW_IT_WORKS = """
+Vote to help the community find the best LLM-as-a-judge to use!
 """
 BATTLE_RULES = """
+## 🤺 Choose the winner
+1. Define your scoring criteria in the **Evaluator Prompt**
+2. Add a test case to the **Sample to evaluate**
+3. Test the evaluators & vote for the model that best aligns with your judgement!
+\n
+Variables defined in your prompt with {{double curly braces}} map to input fields under **Sample to evaluate**.
 <br>
 """
         gap: 8px;
     }
 """
 # Default Eval Prompt
 EVAL_DESCRIPTION = """
+## 📝 Tips
+**Precise evaluation criteria leads to more consistent and reliable judgments.** A good evaluation prompt should include the following elements:
 - Evaluation criteria
 - Scoring rubric
+- Examples (Optional)
 """
+DEFAULT_EVAL_PROMPT = """You are assessing a chat bot response to a user's input based on [WRITE CRITERIA HERE]
 Score:
 A score of 1 means that the response's answer meets all of the evaluation criteria.
 A score of 0 means that the response's answer does not meet all of the evaluation criteria.
 [User Query]: {{input}}
+[Response]: {{response}}"""
 # Default Variable Values
 DEFAULT_INPUT = """Which of these animals is least likely to be found in a rainforest?"
 # Acknowledgements
 ACKNOWLEDGEMENTS = """
+<br><br>
 # Acknowledgements
 We thank [LMSYS Org](https://lmsys.org/) for their hard work on the Chatbot Arena and fully credit them for the inspiration to build this.
 We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
 <br><br>
 # Get in touch
+Feel free to email us at [[email protected]](mailto:[email protected]) or leave feedback on our [Github](https://github.com/atla-ai/judge-arena)!"""