codellama
/

CodeLlama-70b-Python-hf

Text Generation

text-generation-inference

Model card Files Files and versions

osanseviero commited on Jan 29, 2024

Commit

b7ede3a

·

verified ·

1 Parent(s): c647022

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ Model capabilities:
 **Model Developers** Meta
-**Variations** Code Llama comes in three model sizes, and three variants:
 * Code Llama: base models designed for general code synthesis and understanding
 * Code Llama - Python: designed specifically for Python
@@ -52,8 +52,9 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
 **Output** Models generate text only.
 **Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
-**Model Dates** Code Llama and its variants have been trained between January 2023 and July 2023.
 **Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
@@ -69,6 +70,8 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
 ## Hardware and Software
 **Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
 ## Evaluation Results
 See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.

 **Model Developers** Meta
+**Variations** Code Llama comes in four model sizes, and three variants:
 * Code Llama: base models designed for general code synthesis and understanding
 * Code Llama - Python: designed specifically for Python
 **Output** Models generate text only.
 **Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
+**Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant **does not** support long context of up to 100k tokens.
+**Model Dates** Code Llama and its variants have been trained between January 2023 and January 2024.
 **Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
 ## Hardware and Software
 **Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
+**Carbon Footprint** In aggregate, training all 12 Code Llama models required 1400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 228.55 tCO2eq, 100% of which were offset by Meta’s sustainability program.
 ## Evaluation Results
 See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.