Spaces:
Restarting
Restarting
Commit
·
27e5b96
1
Parent(s):
d8147b8
[ADD]Model submission guide and citation
Browse files- app.py +5 -0
- src/about.py +41 -21
app.py
CHANGED
|
@@ -371,8 +371,13 @@ with demo:
|
|
| 371 |
)
|
| 372 |
|
| 373 |
with gr.TabItem("🏅 Open Ended Evaluation", elem_id="llm-benchmark-tab-table", id=1):
|
|
|
|
| 374 |
pass
|
| 375 |
with gr.TabItem("🏅 Med Safety", elem_id="llm-benchmark-tab-table", id=2):
|
|
|
|
|
|
|
|
|
|
|
|
|
| 376 |
pass
|
| 377 |
|
| 378 |
with gr.TabItem("📝 About", elem_id="llm-benchmark-tab-table", id=3):
|
|
|
|
| 371 |
)
|
| 372 |
|
| 373 |
with gr.TabItem("🏅 Open Ended Evaluation", elem_id="llm-benchmark-tab-table", id=1):
|
| 374 |
+
gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
|
| 375 |
pass
|
| 376 |
with gr.TabItem("🏅 Med Safety", elem_id="llm-benchmark-tab-table", id=2):
|
| 377 |
+
gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
|
| 378 |
+
pass
|
| 379 |
+
with gr.TabItem("🏅 Med Safety", elem_id="llm-benchmark-tab-table", id=2):
|
| 380 |
+
gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
|
| 381 |
pass
|
| 382 |
|
| 383 |
with gr.TabItem("📝 About", elem_id="llm-benchmark-tab-table", id=3):
|
src/about.py
CHANGED
|
@@ -227,41 +227,61 @@ Users are advised to approach the results with an understanding of the inherent
|
|
| 227 |
|
| 228 |
EVALUATION_QUEUE_TEXT = """
|
| 229 |
|
| 230 |
-
Currently, the benchmark supports evaluation for models hosted on the huggingface hub and of type
|
| 231 |
-
If your model needs a custom implementation, follow the steps outlined in the [clinical_ner_benchmark](https://github.com/WadoodAbdul/clinical_ner_benchmark/blob/e66eb566f34e33c4b6c3e5258ac85aba42ec7894/docs/custom_model_implementation.md) repo or reach out to our team!
|
| 232 |
|
|
|
|
| 233 |
|
| 234 |
-
|
| 235 |
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
- Decoder: Transformer based autoregressive token generation model.
|
| 243 |
-
- GLiNER: Architecture outlined in the [GLiNER Paper](https://arxiv.org/abs/2311.08526)
|
| 244 |
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
|
|
|
|
|
|
| 248 |
|
| 249 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 250 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 251 |
|
| 252 |
Upon successful submission of your request, your model's result would be updated on the leaderboard within 5 working days!
|
| 253 |
"""
|
| 254 |
|
| 255 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 256 |
CITATION_BUTTON_TEXT = r"""
|
| 257 |
-
@misc{
|
| 258 |
-
title={
|
| 259 |
-
author={
|
| 260 |
year={2024},
|
| 261 |
-
eprint={
|
| 262 |
archivePrefix={arXiv},
|
| 263 |
primaryClass={cs.CL},
|
| 264 |
-
url={https://arxiv.org/abs/
|
| 265 |
-
}
|
| 266 |
-
|
| 267 |
"""
|
|
|
|
| 227 |
|
| 228 |
EVALUATION_QUEUE_TEXT = """
|
| 229 |
|
| 230 |
+
Currently, the benchmark supports evaluation for models hosted on the huggingface hub and of decoder type. It doesn't support adapter models yet but we will soon add adapters too.
|
|
|
|
| 231 |
|
| 232 |
+
## Submission Guide for the MEDIC Benchamark
|
| 233 |
|
| 234 |
+
## First Steps Before Submitting a Model
|
| 235 |
|
| 236 |
+
### 1. Ensure Your Model Loads with AutoClasses
|
| 237 |
+
Verify that you can load your model and tokenizer using AutoClasses:
|
| 238 |
+
```python
|
| 239 |
+
from transformers import AutoConfig, AutoModel, AutoTokenizer
|
| 240 |
+
config = AutoConfig.from_pretrained("your model name", revision=revision)
|
| 241 |
+
model = AutoModel.from_pretrained("your model name", revision=revision)
|
| 242 |
+
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
|
| 243 |
+
```
|
| 244 |
+
Note:
|
| 245 |
+
- If this step fails, debug your model before submitting.
|
| 246 |
+
- Ensure your model is public.
|
| 247 |
+
|
| 248 |
+
### 2. Convert Weights to Safetensors
|
| 249 |
+
[Safetensors](https://huggingface.co/docs/safetensors/index) is a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
|
| 250 |
|
| 251 |
+
### 3. Complete Your Model Card
|
| 252 |
+
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
|
|
|
|
|
|
|
| 253 |
|
| 254 |
+
### 4. Select the correct model type
|
| 255 |
+
Choose the correct model cateogory from the option below:
|
| 256 |
+
- 🟢 : 🟢 pretrained model: new, base models, trained on a given text corpora using masked modelling or new, base models, continuously trained on further corpus (which may include IFT/chat data) using masked modelling
|
| 257 |
+
- ⭕ : ⭕ fine-tuned models: pretrained models finetuned on more data or tasks.
|
| 258 |
+
- 🟦 : 🟦 preference-tuned models: chat like fine-tunes, either using IFT (datasets of task instruction), RLHF or DPO (changing the model loss a bit with an added policy), etc
|
| 259 |
|
| 260 |
+
### 5. Select Correct Precision
|
| 261 |
+
Choose the right precision to avoid evaluation errors:
|
| 262 |
+
- Not all models convert properly from float16 to bfloat16.
|
| 263 |
+
- Incorrect precision can cause issues (e.g., loading a bf16 model in fp16 may generate NaNs).
|
| 264 |
+
- If you have selected auto, the precision mentioned under `torch_dtype` under model config will be used.
|
| 265 |
|
| 266 |
+
### 6. Medically oriented model
|
| 267 |
+
If the model has been specifically built for medical domains i.e. pretrained/finetuned on significant medical data, make sure check the `Domain specific` checkbox
|
| 268 |
+
|
| 269 |
+
### 7. Chat template
|
| 270 |
+
Select this option if your model uses a chat template. The chat template will be used during evaluation.
|
| 271 |
+
- Before submitting, make sure the chat template is defined in tokenizer config.
|
| 272 |
|
| 273 |
Upon successful submission of your request, your model's result would be updated on the leaderboard within 5 working days!
|
| 274 |
"""
|
| 275 |
|
| 276 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 277 |
CITATION_BUTTON_TEXT = r"""
|
| 278 |
+
@misc{kanithi2024mediccomprehensiveframeworkevaluating,
|
| 279 |
+
title={MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications},
|
| 280 |
+
author={Praveen K Kanithi and Clément Christophe and Marco AF Pimentel and Tathagata Raha and Nada Saadi and Hamza Javed and Svetlana Maslenkova and Nasir Hayat and Ronnie Rajan and Shadab Khan},
|
| 281 |
year={2024},
|
| 282 |
+
eprint={2409.07314},
|
| 283 |
archivePrefix={arXiv},
|
| 284 |
primaryClass={cs.CL},
|
| 285 |
+
url={https://arxiv.org/abs/2409.07314},
|
| 286 |
+
}
|
|
|
|
| 287 |
"""
|