| # Model Card | |
| - Source: [https://arxiv.org/abs/2509.02046](https://arxiv.org/abs/2509.02046) | |
| - Optimizer: `scion` | |
| - Model size: `300m` | |
| - Data size: `48B` | |
| ## Best configuration | |
| | Hyperparameter | Value | | |
| |---|---| | |
| | beta1 | `0.98` | | |
| | decay | `0.8` | | |
| | learning_rate | `0.004` | | |
| | lr_schedule | `linear` | | |
| | max_grad_norm | `2` | | |
| | min_lr_ratio | `0` | | |
| | momentum | `0.95` | | |
| | scion_epsilon | `1e-05` | | |
| | scion_to_signum_lr | `0.1` | | |
| | train_batch_size | `128` | | |
| | warmup | `0` | | |
| | weight_decay | `0.1` | | |