--- language: - en tags: - text-generation - transformers - gpt datasets: - HuggingFaceFW/fineweb-edu - cais/mmlu - HuggingFaceTB/smoltalk - openai/gsm8k pipeline_tag: text-generation --- # nanochat **nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs). ## Model Description - **Developed by:** Andrej Karpathy - **Trained by:** Sampath Chanda - **Model type:** Transformer-based causal language model - **Language(s):** English - **License:** MIT - **Parameters:** 560,988,160 (~561M) ### Architecture - **Layers:** 20 - **Hidden size:** 1280 channels - **Attention heads:** 10 - **Head dimension:** 128 - **Vocabulary size:** 65,536 tokens ## Training Details ### Training Data nanochat was trained in multiple stages: 1. **Pretraining:** 100B token subset of FineWeb-EDU (11.2B tokens processed) 2. **Midtraining:** SmolTalk conversations, MMLU multiple choice questions, GSM8K math problems 3. **Supervised Fine-tuning (SFT):** Conversational adaptation data ### Training Procedure #### Tokenization - Custom Rust-based tokenizer - Vocabulary: 65,536 tokens - Compression ratio: 4.8 characters per token ## Citation **Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat) ```bibtex @software{nanochat2025, author = {Karpathy, Andrej}, title = {nanochat: A 561M parameter conversational language model}, year = {2025}, url = {https://github.com/karpathy/nanochat} } ``` ## Model Card Author Sampath Chanda