`aux_loss_alpha` should be 1e-4 instead of 1e-3?
#60
by
						
cuichenx
	
							
						- opened
							
					
According to DeepSeekV3 technical report section 4.2
For the balance loss, we set 𝛼 to 0.0001
According to DeepSeekV3 technical report section 4.2
For the balance loss, we set 𝛼 to 0.0001