--- license: mit --- DeepSeek-V3 architecture with 4 layers + 8 experts per MoE + MTP module + BF16 weights minimally trained with 50k samples generated from Mistral To be used in CI testing