bug: fix minor bugs
Browse files- .github/workflows/huggingface-sync.yml +13 -2
- README.md +19 -17
.github/workflows/huggingface-sync.yml
CHANGED
|
@@ -18,6 +18,15 @@ jobs:
|
|
| 18 |
git config --global user.email "github-actions[bot]@users.noreply.github.com"
|
| 19 |
git config --global user.name "github-actions[bot]"
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
- name: Login to Hugging Face
|
| 22 |
env:
|
| 23 |
HF_TOKEN: ${{ secrets.HUGGINGFACE_TOKEN }}
|
|
@@ -26,5 +35,7 @@ jobs:
|
|
| 26 |
|
| 27 |
- name: Push to Hugging Face Space
|
| 28 |
run: |
|
| 29 |
-
git remote add space https://huggingface.co/spaces/JeffYang52415/LLMEval-Dataset-Parser
|
| 30 |
-
git
|
|
|
|
|
|
|
|
|
| 18 |
git config --global user.email "github-actions[bot]@users.noreply.github.com"
|
| 19 |
git config --global user.name "github-actions[bot]"
|
| 20 |
|
| 21 |
+
- name: Set up Python
|
| 22 |
+
uses: actions/setup-python@v4
|
| 23 |
+
with:
|
| 24 |
+
python-version: "3.x"
|
| 25 |
+
|
| 26 |
+
- name: Install Hugging Face CLI
|
| 27 |
+
run: |
|
| 28 |
+
pip install --upgrade huggingface-hub
|
| 29 |
+
|
| 30 |
- name: Login to Hugging Face
|
| 31 |
env:
|
| 32 |
HF_TOKEN: ${{ secrets.HUGGINGFACE_TOKEN }}
|
|
|
|
| 35 |
|
| 36 |
- name: Push to Hugging Face Space
|
| 37 |
run: |
|
| 38 |
+
git remote add space https://huggingface.co/spaces/JeffYang52415/LLMEval-Dataset-Parser || true
|
| 39 |
+
git fetch space || true
|
| 40 |
+
# Force push to ensure sync, use with caution
|
| 41 |
+
git push -f space main:main
|
README.md
CHANGED
|
@@ -13,10 +13,12 @@ short_description: A collection of parsers for LLM benchmark datasets
|
|
| 13 |
|
| 14 |
**LLMDataParser** is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like **MMLU**, **GSM8k**, and others, streamlining dataset preparation for LLM evaluation. The library aims to simplify the process of working with common LLM benchmark datasets through a consistent API.
|
| 15 |
|
|
|
|
|
|
|
|
|
|
| 16 |
## Features
|
| 17 |
|
| 18 |
- **Unified Interface**: Consistent `DatasetParser` for all datasets.
|
| 19 |
-
- **LLM-Agnostic**: Independent of any specific language model.
|
| 20 |
- **Easy to Use**: Simple methods and built-in Python types.
|
| 21 |
- **Extensible**: Easily add support for new datasets.
|
| 22 |
- **Gradio**: Built-in Gradio interface for interactive dataset exploration and testing.
|
|
@@ -78,22 +80,22 @@ Poetry manages the virtual environment and dependencies automatically, so you do
|
|
| 78 |
Here's a simple example demonstrating how to use the library:
|
| 79 |
|
| 80 |
```python
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
```
|
| 98 |
|
| 99 |
We also provide a Gradio demo for interactive testing:
|
|
|
|
| 13 |
|
| 14 |
**LLMDataParser** is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like **MMLU**, **GSM8k**, and others, streamlining dataset preparation for LLM evaluation. The library aims to simplify the process of working with common LLM benchmark datasets through a consistent API.
|
| 15 |
|
| 16 |
+
**Spaces**: You can also try out the online demo on Hugging Face Spaces:
|
| 17 |
+
[LLMEval Dataset Parser Demo](https://huggingface.co/spaces/JeffYang52415/LLMEval-Dataset-Parser)
|
| 18 |
+
|
| 19 |
## Features
|
| 20 |
|
| 21 |
- **Unified Interface**: Consistent `DatasetParser` for all datasets.
|
|
|
|
| 22 |
- **Easy to Use**: Simple methods and built-in Python types.
|
| 23 |
- **Extensible**: Easily add support for new datasets.
|
| 24 |
- **Gradio**: Built-in Gradio interface for interactive dataset exploration and testing.
|
|
|
|
| 80 |
Here's a simple example demonstrating how to use the library:
|
| 81 |
|
| 82 |
```python
|
| 83 |
+
from llmdataparser import ParserRegistry
|
| 84 |
+
# list all available parsers
|
| 85 |
+
ParserRegistry.list_parsers()
|
| 86 |
+
# get a parser
|
| 87 |
+
parser = ParserRegistry.get_parser("mmlu")
|
| 88 |
+
# load the parser
|
| 89 |
+
parser.load() # optional: task_name, split
|
| 90 |
+
# parse the parser
|
| 91 |
+
parser.parse() # optional: split_names
|
| 92 |
+
|
| 93 |
+
print(parser.task_names)
|
| 94 |
+
print(parser.split_names)
|
| 95 |
+
print(parser.get_dataset_description)
|
| 96 |
+
print(parser.get_huggingface_link)
|
| 97 |
+
print(parser.total_tasks)
|
| 98 |
+
data = parser.get_parsed_data
|
| 99 |
```
|
| 100 |
|
| 101 |
We also provide a Gradio demo for interactive testing:
|