nanochat-d20 / report /chat-rl.md
sampathchanda's picture
Upload folder using huggingface_hub
a465972 verified

Chat RL

timestamp: 2025-10-14 07:06:07

  • run:
  • source: sft
  • dtype: bfloat16
  • device_batch_size: 8
  • examples_per_step: 16
  • num_samples: 16
  • max_new_tokens: 256
  • temperature: 1.0000
  • top_k: 50
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • weight_decay: 0.0000
  • init_lr_frac: 0.0500
  • num_epochs: 1
  • save_every: 60
  • eval_every: 60
  • eval_examples: 400