danita
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,11 +1,13 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
# Model Card for Model ID
|
| 7 |
|
| 8 |
-
|
|
|
|
| 9 |
|
| 10 |
|
| 11 |
|
|
@@ -13,7 +15,6 @@ tags: []
|
|
| 13 |
|
| 14 |
### Model Description
|
| 15 |
|
| 16 |
-
<!-- Provide a longer summary of what this model is. -->
|
| 17 |
|
| 18 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
| 19 |
|
|
@@ -92,8 +93,29 @@ Use the code below to get started with the model.
|
|
| 92 |
|
| 93 |
#### Training Hyperparameters
|
| 94 |
|
| 95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
#### Speeds, Sizes, Times [optional]
|
| 98 |
|
| 99 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
datasets:
|
| 4 |
+
- xfordanita/code-summary-java
|
| 5 |
---
|
| 6 |
|
| 7 |
# Model Card for Model ID
|
| 8 |
|
| 9 |
+
This model is a fine-tuned version of **codellama/CodeLlama-7b-hf** on the **QLoRA** by using the method **PEFT** with library..
|
| 10 |
+
|
| 11 |
|
| 12 |
|
| 13 |
|
|
|
|
| 15 |
|
| 16 |
### Model Description
|
| 17 |
|
|
|
|
| 18 |
|
| 19 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
| 20 |
|
|
|
|
| 93 |
|
| 94 |
#### Training Hyperparameters
|
| 95 |
|
|
|
|
| 96 |
|
| 97 |
+
Training on Free Kaggle GPU 2*(15GB VRAM) with the following params:
|
| 98 |
+
|
| 99 |
+
```py
|
| 100 |
+
training_arguments = TrainingArguments(
|
| 101 |
+
output_dir='./results',
|
| 102 |
+
num_train_epochs=8,
|
| 103 |
+
per_device_train_batch_size=4,
|
| 104 |
+
gradient_accumulation_steps=2,
|
| 105 |
+
optim="paged_adamw_32bit",
|
| 106 |
+
save_steps=0,
|
| 107 |
+
logging_steps=10,
|
| 108 |
+
learning_rate=2e-4,
|
| 109 |
+
weight_decay=0.1, # Utilisation d'une valeur plus élevée pour la régularisation L2
|
| 110 |
+
fp16=True,
|
| 111 |
+
max_grad_norm=1.0, # Réduire la taille maximale des gradients pour éviter les explosions de gradients
|
| 112 |
+
max_steps=-1,
|
| 113 |
+
warmup_ratio=0.1, # Augmentation du ratio de warmup
|
| 114 |
+
group_by_length=True,
|
| 115 |
+
lr_scheduler_type="constant", # Utilisation d'un taux d'apprentissage constant
|
| 116 |
+
report_to="tensorboard"
|
| 117 |
+
)
|
| 118 |
+
```
|
| 119 |
#### Speeds, Sizes, Times [optional]
|
| 120 |
|
| 121 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|