`aux_loss_alpha` should be 1e-4 instead of 1e-3?
Browse filesAccording to DeepSeekV3 technical report section 4.2
> For the balance loss, we set 𝛼 to 0.0001
- config.json +1 -1
    	
        config.json
    CHANGED
    
    | @@ -9,7 +9,7 @@ | |
| 9 | 
             
                "AutoModel": "modeling_deepseek.DeepseekV3Model",
         | 
| 10 | 
             
                "AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
         | 
| 11 | 
             
              },
         | 
| 12 | 
            -
              "aux_loss_alpha": 0. | 
| 13 | 
             
              "bos_token_id": 0,
         | 
| 14 | 
             
              "eos_token_id": 1,
         | 
| 15 | 
             
              "ep_size": 1,
         | 
|  | |
| 9 | 
             
                "AutoModel": "modeling_deepseek.DeepseekV3Model",
         | 
| 10 | 
             
                "AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
         | 
| 11 | 
             
              },
         | 
| 12 | 
            +
              "aux_loss_alpha": 0.0001,
         | 
| 13 | 
             
              "bos_token_id": 0,
         | 
| 14 | 
             
              "eos_token_id": 1,
         | 
| 15 | 
             
              "ep_size": 1,
         | 

