MEDIC-Benchmark

Restarting

App Files Files Community

tathagataraha commited on Oct 24, 2024

Commit

27e5b96

1 Parent(s): d8147b8

[ADD]Model submission guide and citation

Browse files

Files changed (2) hide show

app.py +5 -0
src/about.py +41 -21

app.py CHANGED Viewed

@@ -371,8 +371,13 @@ with demo:
                 )
         with gr.TabItem("🏅 Open Ended Evaluation", elem_id="llm-benchmark-tab-table", id=1):
             pass
         with gr.TabItem("🏅 Med Safety", elem_id="llm-benchmark-tab-table", id=2):
             pass
         with gr.TabItem("📝 About", elem_id="llm-benchmark-tab-table", id=3):

                 )
         with gr.TabItem("🏅 Open Ended Evaluation", elem_id="llm-benchmark-tab-table", id=1):
+            gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
             pass
         with gr.TabItem("🏅 Med Safety", elem_id="llm-benchmark-tab-table", id=2):
+            gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
+            pass
+        with gr.TabItem("🏅 Med Safety", elem_id="llm-benchmark-tab-table", id=2):
+            gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
             pass
         with gr.TabItem("📝 About", elem_id="llm-benchmark-tab-table", id=3):

src/about.py CHANGED Viewed

@@ -227,41 +227,61 @@ Users are advised to approach the results with an understanding of the inherent
 EVALUATION_QUEUE_TEXT = """
-Currently, the benchmark supports evaluation for models hosted on the huggingface hub and of type encoder, decoder or gliner type models.
-If your model needs a custom implementation, follow the steps outlined in the [clinical_ner_benchmark](https://github.com/WadoodAbdul/clinical_ner_benchmark/blob/e66eb566f34e33c4b6c3e5258ac85aba42ec7894/docs/custom_model_implementation.md) repo or reach out to our team!
-### Fields Explanation
-#### Model Type:
-- Fine-Tuned: If the training data consisted of any split/variation of the datasets on the leaderboard.
-- Zero-Shot: If the model did not have any exposure to the datasets on the leaderboard while training.
-#### Model Architecture:
-- Encoder: The standard transformer encoder architecture with a token classification head on top.
-- Decoder: Transformer based autoregressive token generation model.
-- GLiNER: Architecture outlined in the [GLiNER Paper](https://arxiv.org/abs/2311.08526)
-#### Label Normalization Map:
-Not all models have been tuned to output the ner label names in the clinical datasets on this leaderboard. Some models cater to the same entity names with a synonym of it.
-The normalization map can be used to ensure that the models's output are aligned with the labels expected in the datasets.
-Note: Multiple model labels can be mapped to a single entity type in the leaderboard dataset. Ex: 'synonym' and 'disease' to 'condition'
 Upon successful submission of your request, your model's result would be updated on the leaderboard within 5 working days!
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 CITATION_BUTTON_TEXT = r"""
-@misc{abdul2024namedclinicalentityrecognition,
-      title={Named Clinical Entity Recognition Benchmark},
-      author={Wadood M Abdul and Marco AF Pimentel and Muhammad Umar Salman and Tathagata Raha and Clément Christophe and Praveen K Kanithi and Nasir Hayat and Ronnie Rajan and Shadab Khan},
       year={2024},
-      eprint={2410.05046},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2410.05046},
-}
 """

 EVALUATION_QUEUE_TEXT = """
+Currently, the benchmark supports evaluation for models hosted on the huggingface hub and of decoder type. It doesn't support adapter models yet but we will soon add adapters too.
+## Submission Guide for the MEDIC Benchamark
+## First Steps Before Submitting a Model
+### 1. Ensure Your Model Loads with AutoClasses
+Verify that you can load your model and tokenizer using AutoClasses:
+```python
+from transformers import AutoConfig, AutoModel, AutoTokenizer
+config = AutoConfig.from_pretrained("your model name", revision=revision)
+model = AutoModel.from_pretrained("your model name", revision=revision)
+tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
+```
+Note:
+- If this step fails, debug your model before submitting.
+- Ensure your model is public.
+### 2. Convert Weights to Safetensors
+[Safetensors](https://huggingface.co/docs/safetensors/index) is a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
+### 3. Complete Your Model Card
+When we add extra information about models to the leaderboard, it will be automatically taken from the model card
+### 4. Select the correct model type
+Choose the correct model cateogory from the option below:
+- 🟢 : 🟢 pretrained model: new, base models, trained on a given text corpora using masked modelling or new, base models, continuously trained on further corpus (which may include IFT/chat data) using masked modelling
+- ⭕ : ⭕ fine-tuned models: pretrained models finetuned on more data or tasks.
+- 🟦 : 🟦 preference-tuned models: chat like fine-tunes, either using IFT (datasets of task instruction), RLHF or DPO (changing the model loss a bit with an added policy), etc
+### 5. Select Correct Precision
+Choose the right precision to avoid evaluation errors:
+- Not all models convert properly from float16 to bfloat16.
+- Incorrect precision can cause issues (e.g., loading a bf16 model in fp16 may generate NaNs).
+- If you have selected auto, the precision mentioned under `torch_dtype` under model config will be used.
+### 6. Medically oriented model
+If the model has been specifically built for medical domains i.e. pretrained/finetuned on significant medical data, make sure check the `Domain specific` checkbox
+### 7. Chat template
+Select this option if your model uses a chat template. The chat template will be used during evaluation.
+- Before submitting, make sure the chat template is defined in tokenizer config.
 Upon successful submission of your request, your model's result would be updated on the leaderboard within 5 working days!
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 CITATION_BUTTON_TEXT = r"""
+@misc{kanithi2024mediccomprehensiveframeworkevaluating,
+      title={MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications},
+      author={Praveen K Kanithi and Clément Christophe and Marco AF Pimentel and Tathagata Raha and Nada Saadi and Hamza Javed and Svetlana Maslenkova and Nasir Hayat and Ronnie Rajan and Shadab Khan},
       year={2024},
+      eprint={2409.07314},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2409.07314},
+}
 """