Update README.md
Browse files
README.md
CHANGED
|
@@ -30,11 +30,11 @@ The pretraining data has a cutoff date of September 2024.
|
|
| 30 |
|
| 31 |
## Model Overview
|
| 32 |
|
| 33 |
-
NVIDIA Nemotron-H-4B-Instruct-128K is a large language model (LLM) developed by NVIDIA, optimized for single and multi-turn chat, instruction following, and tool-calling use-cases. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just four Attention layers. The model is an aligned version of Nemotron-H-4B-Base-8K, and features a 128K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
|
| 34 |
|
| 35 |
The model underwent a multi-phase post-training process including multiple supervised fine-tuning stages for math, code, science, and then chat, instruction following, and tool-calling, followed by multiple preference tuning stages using Reward-aware Preference Optimization (RPO) for both chat and instruction-following.
|
| 36 |
|
| 37 |
-
The model was pruned and distilled from [Nemotron-H-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K) using our hybrid language model compression technique. For more details, please refer to the [paper](https://arxiv.org/abs/2504.11409).
|
| 38 |
|
| 39 |
The paper has been accepted for publication at NeurIPS 2025.
|
| 40 |
|
|
|
|
| 30 |
|
| 31 |
## Model Overview
|
| 32 |
|
| 33 |
+
NVIDIA Nemotron-H-4B-Instruct-128K is a large language model (LLM) developed by NVIDIA, optimized for single and multi-turn chat, instruction following, and tool-calling use-cases. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just four Attention layers. The model is an aligned version of [Nemotron-H-4B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-4B-Base-8K), and features a 128K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
|
| 34 |
|
| 35 |
The model underwent a multi-phase post-training process including multiple supervised fine-tuning stages for math, code, science, and then chat, instruction following, and tool-calling, followed by multiple preference tuning stages using Reward-aware Preference Optimization (RPO) for both chat and instruction-following.
|
| 36 |
|
| 37 |
+
The [base model](https://huggingface.co/nvidia/Nemotron-H-4B-Base-8K) was pruned and distilled from [Nemotron-H-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K) using our hybrid language model compression technique. For more details, please refer to the [paper](https://arxiv.org/abs/2504.11409).
|
| 38 |
|
| 39 |
The paper has been accepted for publication at NeurIPS 2025.
|
| 40 |
|