Update README.md
Browse files
README.md
CHANGED
|
@@ -44,7 +44,7 @@ You can then start chatting with the model, *e.g.* prompt it with something like
|
|
| 44 |
|
| 45 |
## Evaluation
|
| 46 |
We evaluate Mathstral 7B and open-weight models of the similar size on industry-standard benchmarks.
|
| 47 |
-
| Benchmarks | MATH | GSM8K (8-shot) | Odyssey Math | GRE Math | AMC 2023 | AIME 2024
|
| 48 |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 49 |
| Mathstral 7B | **56.6** | 77.1 | **37.2** | 56.9 | **42.4** | **2/30** |
|
| 50 |
| DeepSeek Math 7B | 44.4 | **80.6** | 27.6 | 44.6 | 28.0 | 0/30 |
|
|
|
|
| 44 |
|
| 45 |
## Evaluation
|
| 46 |
We evaluate Mathstral 7B and open-weight models of the similar size on industry-standard benchmarks.
|
| 47 |
+
| Benchmarks | MATH | GSM8K (8-shot) | Odyssey Math maj@16 | GRE Math maj@16 | AMC 2023 maj@16 | AIME 2024 maj@16
|
| 48 |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 49 |
| Mathstral 7B | **56.6** | 77.1 | **37.2** | 56.9 | **42.4** | **2/30** |
|
| 50 |
| DeepSeek Math 7B | 44.4 | **80.6** | 27.6 | 44.6 | 28.0 | 0/30 |
|