DeepSeek-V3 architecture with 4 layers + 8 experts per MoE + MTP module + BF16 weights minimally trained with 50k samples generated from Mistral

To be used in CI testing

Downloads last month
4
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support