nanochat

nanochat is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).

Model Description

  • Developed by: Andrej Karpathy
  • Trained by: Sampath Chanda
  • Model type: Transformer-based causal language model
  • Language(s): English
  • License: MIT
  • Parameters: 560,988,160 (~561M)

Architecture

  • Layers: 20
  • Hidden size: 1280 channels
  • Attention heads: 10
  • Head dimension: 128
  • Vocabulary size: 65,536 tokens

Training Details

Training Data

nanochat was trained in multiple stages:

  1. Pretraining: 100B token subset of FineWeb-EDU (11.2B tokens processed)
  2. Midtraining: SmolTalk conversations, MMLU multiple choice questions, GSM8K math problems
  3. Supervised Fine-tuning (SFT): Conversational adaptation data

Training Procedure

Tokenization

  • Custom Rust-based tokenizer
  • Vocabulary: 65,536 tokens
  • Compression ratio: 4.8 characters per token

Citation

Repository: github.com/karpathy/nanochat

@software{nanochat2025,
  author = {Karpathy, Andrej},
  title = {nanochat: A 561M parameter conversational language model},
  year = {2025},
  url = {https://github.com/karpathy/nanochat}
}

Model Card Author

Sampath Chanda

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train sampathchanda/nanochat-d20