Update README.md
Browse files
README.md
CHANGED
|
@@ -181,7 +181,7 @@ TODO
|
|
| 181 |
#### Training Hyperparameters
|
| 182 |
|
| 183 |
The following hyperparameters were used during training:
|
| 184 |
-
- learning_rate: 0.
|
| 185 |
- train_batch_size: 1024
|
| 186 |
- eval_batch_size: 256
|
| 187 |
- seed: 42
|
|
@@ -190,7 +190,7 @@ The following hyperparameters were used during training:
|
|
| 190 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 191 |
- lr_scheduler_type: cosine
|
| 192 |
- lr_scheduler_warmup_steps: 1000
|
| 193 |
-
- num_epochs:
|
| 194 |
|
| 195 |
## Evaluation
|
| 196 |
|
|
@@ -204,7 +204,7 @@ The architecture of this model is [Mixtral](https://huggingface.co/docs/transfor
|
|
| 204 |
|
| 205 |
### Compute Infrastructure
|
| 206 |
|
| 207 |
-
|
| 208 |
|
| 209 |
#### Hardware
|
| 210 |
|
|
|
|
| 181 |
#### Training Hyperparameters
|
| 182 |
|
| 183 |
The following hyperparameters were used during training:
|
| 184 |
+
- learning_rate: 0.0005
|
| 185 |
- train_batch_size: 1024
|
| 186 |
- eval_batch_size: 256
|
| 187 |
- seed: 42
|
|
|
|
| 190 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 191 |
- lr_scheduler_type: cosine
|
| 192 |
- lr_scheduler_warmup_steps: 1000
|
| 193 |
+
- num_epochs: 5
|
| 194 |
|
| 195 |
## Evaluation
|
| 196 |
|
|
|
|
| 204 |
|
| 205 |
### Compute Infrastructure
|
| 206 |
|
| 207 |
+
Server in a university laboratory
|
| 208 |
|
| 209 |
#### Hardware
|
| 210 |
|