Update README.md
Browse files
README.md
CHANGED
|
@@ -14,6 +14,8 @@ The model is trained from [meta-llama/Meta-Llama-3-8B](https://huggingface.co/me
|
|
| 14 |
|
| 15 |
## Academic Benchmarks
|
| 16 |
|
|
|
|
|
|
|
| 17 |
| **Model** | **Size** | **Method** | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** |
|
| 18 |
|----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
|
| 19 |
| LLaMA-3-8B-it | 8B | RS+DPO+PPO |22.9|8.16| 79.6 | 26.3 | 66.0 | 61.6 | 43.9 | 59.5 |
|
|
|
|
| 14 |
|
| 15 |
## Academic Benchmarks
|
| 16 |
|
| 17 |
+
We use ToRA script to evaluate GSM8K and MATH, Evalplut for HumanEval, and lm-evaluation-harness for other benchmarks. The model is evaluated in zero-shot setting.
|
| 18 |
+
|
| 19 |
| **Model** | **Size** | **Method** | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** |
|
| 20 |
|----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
|
| 21 |
| LLaMA-3-8B-it | 8B | RS+DPO+PPO |22.9|8.16| 79.6 | 26.3 | 66.0 | 61.6 | 43.9 | 59.5 |
|