---
language:
- en
tags:
- text-generation
- transformers
- gpt
datasets:
- HuggingFaceFW/fineweb-edu
- cais/mmlu
- HuggingFaceTB/smoltalk
- openai/gsm8k
pipeline_tag: text-generation
---

# nanochat

**nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models
can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).

## Model Description

- **Developed by:** Andrej Karpathy
- **Trained by:** Sampath Chanda
- **Model type:** Transformer-based causal language model
- **Language(s):** English
- **License:** MIT
- **Parameters:** 560,988,160 (~561M)

### Architecture

- **Layers:** 20
- **Hidden size:** 1280 channels
- **Attention heads:** 10
- **Head dimension:** 128
- **Vocabulary size:** 65,536 tokens

## Training Details

### Training Data

nanochat was trained in multiple stages:

1. **Pretraining:** 100B token subset of FineWeb-EDU (11.2B tokens processed)
2. **Midtraining:** SmolTalk conversations, MMLU multiple choice questions, GSM8K math problems
3. **Supervised Fine-tuning (SFT):** Conversational adaptation data

### Training Procedure

#### Tokenization
- Custom Rust-based tokenizer
- Vocabulary: 65,536 tokens
- Compression ratio: 4.8 characters per token

## Citation

**Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat)

```bibtex
@software{nanochat2025,
  author = {Karpathy, Andrej},
  title = {nanochat: A 561M parameter conversational language model},
  year = {2025},
  url = {https://github.com/karpathy/nanochat}
}
```

## Model Card Author
Sampath Chanda