Spaces:
Running
Running
Fix format & Add BigCodeBench (#10)
Browse files- Fix format & Add BigCodeBench (a1e04bec5d16ba9ad8e87d0f4f29fd72067deac9)
- Update README.md (7d69a4ab7102ef1affb896d091e6961a283f0e80)
Co-authored-by: Terry Yue Zhuo <[email protected]>
README.md
CHANGED
|
@@ -41,6 +41,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
| 41 |
- [StarCoder2 Search](https://huggingface.co/spaces/bigcode/search-v2): Full-text search code in the pretraining dataset.
|
| 42 |
- [StarCoder2 Membership Test](https://stack-v2.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
| 43 |
</details>
|
|
|
|
| 44 |
---
|
| 45 |
<details>
|
| 46 |
<summary>
|
|
@@ -52,6 +53,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
| 52 |
- [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
|
| 53 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
| 54 |
</details>
|
|
|
|
| 55 |
---
|
| 56 |
<details>
|
| 57 |
<summary>
|
|
@@ -82,17 +84,34 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
| 82 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
| 83 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
| 84 |
</details>
|
|
|
|
| 85 |
---
|
| 86 |
<details>
|
| 87 |
<summary>
|
| 88 |
<b><font size="+1">📑The Stack</font></b>
|
| 89 |
</summary>
|
| 90 |
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
|
| 91 |
-
|
| 92 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
| 93 |
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
| 94 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
| 95 |
</details>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
---
|
| 97 |
<details>
|
| 98 |
<summary>
|
|
@@ -111,6 +130,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
| 111 |
- [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
|
| 112 |
- [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
|
| 113 |
</details>
|
|
|
|
| 114 |
---
|
| 115 |
<details>
|
| 116 |
<summary>
|
|
@@ -126,6 +146,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
| 126 |
- [Astraios-7B](https://huggingface.co/collections/bigcode/astraios-7b-65788b509c5c26f96c08d576): Collection of StarCoderBase-7B models instruction tuned on CommitPackFT + OASST with 7 method.
|
| 127 |
- [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
|
| 128 |
</details>
|
|
|
|
| 129 |
---
|
| 130 |
<details>
|
| 131 |
<summary>
|
|
|
|
| 41 |
- [StarCoder2 Search](https://huggingface.co/spaces/bigcode/search-v2): Full-text search code in the pretraining dataset.
|
| 42 |
- [StarCoder2 Membership Test](https://stack-v2.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
| 43 |
</details>
|
| 44 |
+
|
| 45 |
---
|
| 46 |
<details>
|
| 47 |
<summary>
|
|
|
|
| 53 |
- [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
|
| 54 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
| 55 |
</details>
|
| 56 |
+
|
| 57 |
---
|
| 58 |
<details>
|
| 59 |
<summary>
|
|
|
|
| 84 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
| 85 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
| 86 |
</details>
|
| 87 |
+
|
| 88 |
---
|
| 89 |
<details>
|
| 90 |
<summary>
|
| 91 |
<b><font size="+1">📑The Stack</font></b>
|
| 92 |
</summary>
|
| 93 |
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
|
| 94 |
+
|
| 95 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
| 96 |
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
| 97 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
| 98 |
</details>
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
<details>
|
| 102 |
+
<summary>
|
| 103 |
+
<b><font size="+1">🌸BigCodeBench</font></b>
|
| 104 |
+
</summary>
|
| 105 |
+
BigCodeBench is the next generation of HumanEval, benchmarking code generation with diverse function calls and complex instructions.
|
| 106 |
+
|
| 107 |
+
- [Github](https://github.com/bigcode-project/bigcodebench): Evaluation tool designed for BigCodeBench.
|
| 108 |
+
- [HF Leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard): BigCodeBench leaderboard hosted on Hugging Face.
|
| 109 |
+
- [GP Leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard): BigCodeBench leaderboard hosted on GitHub Pages.
|
| 110 |
+
- [Dataset](https://huggingface.co/datasets/bigcode/bigcodebench): BigCodeBench dataset.
|
| 111 |
+
- [Data Viewer](https://huggingface.co/spaces/bigcode/bigcodebench-viewer): Explore BigCodeBench data in an interactive demo.
|
| 112 |
+
- [Paper](https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/paper.pdf): Research paper with details about BigCodeBench.
|
| 113 |
+
</details>
|
| 114 |
+
|
| 115 |
---
|
| 116 |
<details>
|
| 117 |
<summary>
|
|
|
|
| 130 |
- [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
|
| 131 |
- [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
|
| 132 |
</details>
|
| 133 |
+
|
| 134 |
---
|
| 135 |
<details>
|
| 136 |
<summary>
|
|
|
|
| 146 |
- [Astraios-7B](https://huggingface.co/collections/bigcode/astraios-7b-65788b509c5c26f96c08d576): Collection of StarCoderBase-7B models instruction tuned on CommitPackFT + OASST with 7 method.
|
| 147 |
- [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
|
| 148 |
</details>
|
| 149 |
+
|
| 150 |
---
|
| 151 |
<details>
|
| 152 |
<summary>
|