Update README.md
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ Model capabilities:
|
|
| 37 |
|
| 38 |
**Model Developers** Meta
|
| 39 |
|
| 40 |
-
**Variations** Code Llama comes in
|
| 41 |
|
| 42 |
* Code Llama: base models designed for general code synthesis and understanding
|
| 43 |
* Code Llama - Python: designed specifically for Python
|
|
@@ -52,8 +52,9 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
|
|
| 52 |
**Output** Models generate text only.
|
| 53 |
|
| 54 |
**Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
|
|
|
|
| 55 |
|
| 56 |
-
**Model Dates** Code Llama and its variants have been trained between January 2023 and
|
| 57 |
|
| 58 |
**Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
|
| 59 |
|
|
@@ -69,6 +70,8 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
|
|
| 69 |
## Hardware and Software
|
| 70 |
**Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
|
| 71 |
|
|
|
|
|
|
|
| 72 |
## Evaluation Results
|
| 73 |
|
| 74 |
See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.
|
|
|
|
| 37 |
|
| 38 |
**Model Developers** Meta
|
| 39 |
|
| 40 |
+
**Variations** Code Llama comes in four model sizes, and three variants:
|
| 41 |
|
| 42 |
* Code Llama: base models designed for general code synthesis and understanding
|
| 43 |
* Code Llama - Python: designed specifically for Python
|
|
|
|
| 52 |
**Output** Models generate text only.
|
| 53 |
|
| 54 |
**Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
|
| 55 |
+
**Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant **does not** support long context of up to 100k tokens.
|
| 56 |
|
| 57 |
+
**Model Dates** Code Llama and its variants have been trained between January 2023 and January 2024.
|
| 58 |
|
| 59 |
**Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
|
| 60 |
|
|
|
|
| 70 |
## Hardware and Software
|
| 71 |
**Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
|
| 72 |
|
| 73 |
+
**Carbon Footprint** In aggregate, training all 12 Code Llama models required 1400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 228.55 tCO2eq, 100% of which were offset by Meta’s sustainability program.
|
| 74 |
+
|
| 75 |
## Evaluation Results
|
| 76 |
|
| 77 |
See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.
|