Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
title: LeaderboardFinder
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: pink
|
| 5 |
colorTo: gray
|
| 6 |
sdk: gradio
|
|
@@ -9,50 +9,4 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
-
If you want your leaderboard to appear, feel free to add relevant information in its metadata, and it will be displayed here.
|
| 13 |
-
|
| 14 |
-
# Categories
|
| 15 |
-
|
| 16 |
-
## Submission type
|
| 17 |
-
Arenas are not concerned by this category.
|
| 18 |
-
|
| 19 |
-
- `submission:automatic`: users can submit their models as such to the leaderboard, and evaluation is run automatically without human intervention
|
| 20 |
-
- `submission:semiautomatic`: the leaderboard requires the model owner to run evaluations on his side and submit the results
|
| 21 |
-
- `submission:manual`: the leaderboard requires the leaderboard owner to run evaluations for new submissions
|
| 22 |
-
- `submission:closed`: the leaderboard does not accept submissions at the moment
|
| 23 |
-
|
| 24 |
-
## Test set status
|
| 25 |
-
Arenas are not concerned by this category.
|
| 26 |
-
|
| 27 |
-
- `test:public`: all the test sets used are public, the evaluations are completely reproducible
|
| 28 |
-
- `test:mix`: some test sets are public and some private
|
| 29 |
-
- `test:private`: all the test sets used are private, the evaluations are hard to game
|
| 30 |
-
- `test:rolling`: the test sets used change regularly through time and evaluation scores are refreshed
|
| 31 |
-
|
| 32 |
-
## Judges
|
| 33 |
-
- `judge:auto`: evaluations are run automatically, using an evaluation suite such as `lm_eval` or `lighteval`
|
| 34 |
-
- `judge:model`: evaluations are run using a model as a judge approach to rate answer
|
| 35 |
-
- `judge:humans`: evaluations are done by humans to rate answer - this is an arena
|
| 36 |
-
- `judge:vibe_check`: evaluations are done manually by one human
|
| 37 |
-
|
| 38 |
-
## Modalities
|
| 39 |
-
Can be any (or several) of the following list:
|
| 40 |
-
- `modality:text`
|
| 41 |
-
- `modality:image`
|
| 42 |
-
- `modality:video`
|
| 43 |
-
- `modality:audio`
|
| 44 |
-
A bit outside of usual modalities
|
| 45 |
-
- `modality:tools`: requires added tool usage - mostly for assistant models
|
| 46 |
-
- `modality:artefacts`: the leaderboard concerns itself with machine learning artefacts as themselves, for example, quality evaluation of text embeddings.
|
| 47 |
-
|
| 48 |
-
## Evaluation categories
|
| 49 |
-
Can be any (or several) of the following list:
|
| 50 |
-
- `eval:generation`: the evaluation looks at generation capabilities specifically (can be image generation, text generation, ...)
|
| 51 |
-
- `eval:math`
|
| 52 |
-
- `eval:code`
|
| 53 |
-
- `eval:performance`: model performance (speed, energy consumption, ...)
|
| 54 |
-
- `eval:safety`: safety, toxicity, bias evaluations
|
| 55 |
-
|
| 56 |
-
## Language
|
| 57 |
-
You can indicate the languages covered by your benchmark like so: `language:mylanguage`.
|
| 58 |
-
At the moment, we do not support language codes, please use the language name in English.
|
|
|
|
| 1 |
---
|
| 2 |
title: LeaderboardFinder
|
| 3 |
+
emoji: 🔎
|
| 4 |
colorFrom: pink
|
| 5 |
colorTo: gray
|
| 6 |
sdk: gradio
|
|
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
+
If you want your leaderboard to appear, feel free to add relevant information in its metadata, and it will be displayed here (see the About tab).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|