Se124M100KInfPrompt_endtoken

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.7209 1.0 5717 0.7060
0.7027 2.0 11434 0.6916
0.7005 3.0 17151 0.6865
0.7009 4.0 22868 0.6858
0.6933 5.0 28585 0.6854
0.6922 6.0 34302 0.6825
0.6859 7.0 40019 0.6810
0.6923 8.0 45736 0.6812
0.6919 9.0 51453 0.6809
0.6871 10.0 57170 0.6795
0.6844 11.0 62887 0.6776
0.6923 12.0 68604 0.6780
0.6878 13.0 74321 0.6785
0.6765 14.0 80038 0.6775
0.6864 15.0 85755 0.6769
0.6776 16.0 91472 0.6761
0.6823 17.0 97189 0.6768
0.6743 18.0 102906 0.6751
0.682 19.0 108623 0.6776
0.6902 20.0 114340 0.6762
0.6774 21.0 120057 0.6751
0.6748 22.0 125774 0.6747
0.6864 23.0 131491 0.6745
0.6819 24.0 137208 0.6756
0.6818 25.0 142925 0.6745
0.6757 26.0 148642 0.6737
0.6801 27.0 154359 0.6734
0.6717 28.0 160076 0.6724
0.6717 29.0 165793 0.6722
0.6802 30.0 171510 0.6723
0.677 31.0 177227 0.6725
0.6764 32.0 182944 0.6712
0.6767 33.0 188661 0.6712
0.6758 34.0 194378 0.6716
0.6772 35.0 200095 0.6715
0.679 36.0 205812 0.6717
0.6744 37.0 211529 0.6702
0.6654 38.0 217246 0.6707
0.6723 39.0 222963 0.6704
0.6758 40.0 228680 0.6701
0.6795 41.0 234397 0.6701
0.6681 42.0 240114 0.6698
0.6761 43.0 245831 0.6700
0.673 44.0 251548 0.6697
0.6736 45.0 257265 0.6698
0.673 46.0 262982 0.6695
0.6686 47.0 268699 0.6695
0.666 48.0 274416 0.6696
0.663 49.0 280133 0.6695
0.6667 50.0 285850 0.6695

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu118
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for augustocsc/Se124M100KInfPrompt_endtoken

Adapter
(1634)
this model