|
|
+ deepspeed --master_port 23981 --module safe_rlhf.finetune --train_datasets inverse-json::/home/hansirui_1st/jiayi/resist/imdb_data/train/neg/100/train.json --model_name_or_path /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000 --max_length 512 --trust_remote_code True --epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --gradient_checkpointing --learning_rate 1e-5 --lr_warmup_ratio 0 --weight_decay 0.0 --lr_scheduler_type constant --weight_decay 0.0 --seed 42 --output_dir /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000-Q2-100 --log_type wandb --log_run_name imdb-Qwen1.5-0.5B-s3-Q1-1000-Q2-100 --log_project Inverse_Alignment_IMDb --zero_stage 3 --offload none --bf16 True --tf32 True --save_16bit |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
[rank7]:[W526 15:11:40.874797655 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank0]:[W526 15:11:40.914798928 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank6]:[W526 15:11:40.920256900 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank2]:[W526 15:11:40.931298563 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank1]:[W526 15:11:40.933808147 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank4]:[W526 15:11:40.973683979 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank5]:[W526 15:11:40.973725178 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank3]:[W526 15:11:40.974444631 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/config.json |
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
Model config Qwen2Config { |
|
|
"architectures": [ |
|
|
"Qwen2ForCausalLM" |
|
|
], |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 1024, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 2816, |
|
|
"max_position_embeddings": 32768, |
|
|
"max_window_layers": 21, |
|
|
"model_type": "qwen2", |
|
|
"num_attention_heads": 16, |
|
|
"num_hidden_layers": 24, |
|
|
"num_key_value_heads": 16, |
|
|
"pad_token_id": 151643, |
|
|
"rms_norm_eps": 1e-06, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 1000000.0, |
|
|
"sliding_window": 32768, |
|
|
"tie_word_embeddings": true, |
|
|
"torch_dtype": "bfloat16", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"use_sliding_window": false, |
|
|
"vocab_size": 151646 |
|
|
} |
|
|
|
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.bfloat16 as defined in model |
|
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 128245, |
|
|
"eos_token_id": 151643, |
|
|
"pad_token_id": 151643 |
|
|
} |
|
|
|
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
loading file vocab.json |
|
|
loading file vocab.json |
|
|
loading file vocab.json |
|
|
loading file vocab.json |
|
|
loading file merges.txt |
|
|
loading file merges.txt |
|
|
loading file vocab.json |
|
|
loading file merges.txt |
|
|
loading file merges.txt |
|
|
loading file tokenizer.json |
|
|
loading file tokenizer.json |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file merges.txt |
|
|
loading file added_tokens.json |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file special_tokens_map.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer.json |
|
|
loading file tokenizer_config.json |
|
|
loading file tokenizer_config.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file added_tokens.json |
|
|
loading file chat_template.jinja |
|
|
loading file chat_template.jinja |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer_config.json |
|
|
loading file special_tokens_map.json |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
loading file vocab.json |
|
|
loading file merges.txt |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
loading file vocab.json |
|
|
loading file merges.txt |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
loading file vocab.json |
|
|
loading file merges.txt |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Detected CUDA files, patching ldflags |
|
|
Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
|
|
/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
|
|
If this is not desired, please set os.environ[ |
|
|
warnings.warn( |
|
|
Building extension module fused_adam... |
|
|
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam...Loading extension module fused_adam... |
|
|
|
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam... |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login --relogin` to force relogin |
|
|
wandb: Tracking run with wandb version 0.19.11 |
|
|
wandb: Run data is saved locally in /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000-Q2-100/wandb/run-20250526_151156-o9zxeujn |
|
|
wandb: Run `wandb offline` to turn off syncing. |
|
|
wandb: Syncing run imdb-Qwen1.5-0.5B-s3-Q1-1000-Q2-100 |
|
|
wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment_IMDb |
|
|
wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment_IMDb/runs/o9zxeujn |
|
|
Training 1/1 epoch: 0%| | 0/13 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
Training 1/1 epoch (loss 3.1070): 0%| | 0/13 [00:08<?, ?it/s]
Training 1/1 epoch (loss 3.1070): 8%|β | 1/13 [00:08<01:43, 8.63s/it]
Training 1/1 epoch (loss 3.2699): 8%|β | 1/13 [00:11<01:43, 8.63s/it]
Training 1/1 epoch (loss 3.2699): 15%|ββ | 2/13 [00:11<01:00, 5.48s/it]
Training 1/1 epoch (loss 3.4265): 15%|ββ | 2/13 [00:12<01:00, 5.48s/it]
Training 1/1 epoch (loss 3.4265): 23%|βββ | 3/13 [00:12<00:33, 3.32s/it]
Training 1/1 epoch (loss 3.3756): 23%|βββ | 3/13 [00:13<00:33, 3.32s/it]
Training 1/1 epoch (loss 3.3756): 31%|βββ | 4/13 [00:13<00:21, 2.38s/it]
Training 1/1 epoch (loss 3.1592): 31%|βββ | 4/13 [00:14<00:21, 2.38s/it]
Training 1/1 epoch (loss 3.1592): 38%|ββββ | 5/13 [00:14<00:14, 1.83s/it]
Training 1/1 epoch (loss 3.0803): 38%|ββββ | 5/13 [00:15<00:14, 1.83s/it]
Training 1/1 epoch (loss 3.0803): 46%|βββββ | 6/13 [00:15<00:10, 1.51s/it]
Training 1/1 epoch (loss 3.3176): 46%|βββββ | 6/13 [00:15<00:10, 1.51s/it]
Training 1/1 epoch (loss 3.3176): 54%|ββββββ | 7/13 [00:15<00:07, 1.17s/it]
Training 1/1 epoch (loss 3.2755): 54%|ββββββ | 7/13 [00:17<00:07, 1.17s/it]
Training 1/1 epoch (loss 3.2755): 62%|βββββββ | 8/13 [00:17<00:06, 1.26s/it]
Training 1/1 epoch (loss 3.0529): 62%|βββββββ | 8/13 [00:18<00:06, 1.26s/it]
Training 1/1 epoch (loss 3.0529): 69%|βββββββ | 9/13 [00:18<00:04, 1.23s/it]
Training 1/1 epoch (loss 3.0969): 69%|βββββββ | 9/13 [00:18<00:04, 1.23s/it]
Training 1/1 epoch (loss 3.0969): 77%|ββββββββ | 10/13 [00:18<00:02, 1.02it/s]
Training 1/1 epoch (loss 3.1494): 77%|ββββββββ | 10/13 [00:19<00:02, 1.02it/s]
Training 1/1 epoch (loss 3.1494): 85%|βββββββββ | 11/13 [00:19<00:01, 1.04it/s]
Training 1/1 epoch (loss 3.4086): 85%|βββββββββ | 11/13 [00:20<00:01, 1.04it/s]
Training 1/1 epoch (loss 3.4086): 92%|ββββββββββ| 12/13 [00:20<00:00, 1.05it/s]
Training 1/1 epoch (loss 3.1849): 92%|ββββββββββ| 12/13 [00:21<00:00, 1.05it/s]
Training 1/1 epoch (loss 3.1849): 100%|ββββββββββ| 13/13 [00:21<00:00, 1.06it/s]
Training 1/1 epoch (loss 3.1849): 100%|ββββββββββ| 13/13 [00:21<00:00, 1.66s/it] |
|
|
chat template saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000-Q2-100/chat_template.jinja |
|
|
tokenizer config file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000-Q2-100/tokenizer_config.json |
|
|
Special tokens file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000-Q2-100/special_tokens_map.json |
|
|
wandb: |
|
|
wandb: |
|
|
wandb: Run history: |
|
|
wandb: train/epoch βββββββ
β
βββββ |
|
|
wandb: train/loss ββ
ββββββ
βββββ |
|
|
wandb: train/lr βββββββββββββ |
|
|
wandb: train/step βββββββ
β
βββββ |
|
|
wandb: |
|
|
wandb: Run summary: |
|
|
wandb: train/epoch 1 |
|
|
wandb: train/loss 3.18486 |
|
|
wandb: train/lr 1e-05 |
|
|
wandb: train/step 13 |
|
|
wandb: |
|
|
wandb: π View run imdb-Qwen1.5-0.5B-s3-Q1-1000-Q2-100 at: https://wandb.ai/xtom/Inverse_Alignment_IMDb/runs/o9zxeujn |
|
|
wandb: βοΈ View project at: https://wandb.ai/xtom/Inverse_Alignment_IMDb |
|
|
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) |
|
|
wandb: Find logs at: /aifs4su/hansirui_1st/jiayi/setting3-imdb/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-1000-Q2-100/wandb/run-20250526_151156-o9zxeujn/logs |
|
|
|