small-2e4

This model is a fine-tuned version of deepseek-ai/deepseek-coder-6.7b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1517

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.6562 0.2001 720 0.2002
0.409 0.4002 1440 0.1755
0.3405 0.6003 2160 0.1627
0.3213 0.8003 2880 0.1518
0.2939 1.0003 3600 0.1442
0.2628 1.2004 4320 0.1424
0.2527 1.4004 5040 0.1359
0.2495 1.6005 5760 0.1338
0.2456 1.8006 6480 0.1309
0.2363 2.0006 7200 0.1261
0.2111 2.2006 7920 0.1215
0.2042 2.4007 8640 0.1202
0.1993 2.6008 9360 0.1204
0.1988 2.8009 10080 0.1178
0.2018 3.0008 10800 0.1177
0.1678 3.2009 11520 0.1174
0.1722 3.4010 12240 0.1147
0.1695 3.6011 12960 0.1106
0.1677 3.8012 13680 0.1132
0.166 4.0011 14400 0.1129
0.141 4.2012 15120 0.1127
0.1426 4.4013 15840 0.1129
0.1417 4.6014 16560 0.1139
0.1441 4.8014 17280 0.1127
0.1386 5.0014 18000 0.1093
0.12 5.2015 18720 0.1159
0.1209 5.4016 19440 0.1134
0.1232 5.6016 20160 0.1129
0.1243 5.8017 20880 0.1131
0.1213 6.0017 21600 0.1090
0.1063 6.2018 22320 0.1152
0.1058 6.4018 23040 0.1166
0.1063 6.6019 23760 0.1178
0.1052 6.8020 24480 0.1208
0.1071 7.0019 25200 0.1151
0.0938 7.2020 25920 0.1268
0.0898 7.4021 26640 0.1217
0.0908 7.6022 27360 0.1225
0.0915 7.8023 28080 0.1184
0.0924 8.0022 28800 0.1213
0.0789 8.2023 29520 0.1276
0.0765 8.4024 30240 0.1268
0.0776 8.6025 30960 0.1293
0.0794 8.8026 31680 0.1303
0.0788 9.0025 32400 0.1285
0.0655 9.2026 33120 0.1376
0.0682 9.4027 33840 0.1354
0.0684 9.6028 34560 0.1369
0.0703 9.8028 35280 0.1347
0.0669 10.0028 36000 0.1344
0.0595 10.2029 36720 0.1443
0.0581 10.4029 37440 0.1461
0.0596 10.6030 38160 0.1442
0.0595 10.8031 38880 0.1413
0.0614 11.0031 39600 0.1439
0.0542 11.2031 40320 0.1542
0.0537 11.4032 41040 0.1504
0.0533 11.6033 41760 0.1529
0.054 11.8034 42480 0.1517

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/small-2e4

Adapter
(34)
this model