nanochat
nanochat is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).
Model Description
- Developed by: Andrej Karpathy
- Trained by: Sampath Chanda
- Model type: Transformer-based causal language model
- Language(s): English
- License: MIT
- Parameters: 560,988,160 (~561M)
Architecture
- Layers: 20
- Hidden size: 1280 channels
- Attention heads: 10
- Head dimension: 128
- Vocabulary size: 65,536 tokens
Training Details
Training Data
nanochat was trained in multiple stages:
- Pretraining: 100B token subset of FineWeb-EDU (11.2B tokens processed)
- Midtraining: SmolTalk conversations, MMLU multiple choice questions, GSM8K math problems
- Supervised Fine-tuning (SFT): Conversational adaptation data
Training Procedure
Tokenization
- Custom Rust-based tokenizer
- Vocabulary: 65,536 tokens
- Compression ratio: 4.8 characters per token
Citation
Repository: github.com/karpathy/nanochat
@software{nanochat2025,
author = {Karpathy, Andrej},
title = {nanochat: A 561M parameter conversational language model},
year = {2025},
url = {https://github.com/karpathy/nanochat}
}
Model Card Author
Sampath Chanda