I need help.
#29
by
						
thebryanalvarado
	
							
						- opened
							
					
Hello Comunity I want to improve this train process:
%%time
if to_train:
    output_dir = f'./sql-training-{str(int(time.time()))}'
training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=5e-3,
    num_train_epochs=2,
    per_device_train_batch_size=16,     # batch size per device during training
    per_device_eval_batch_size=16,      # batch size for evaluation
    weight_decay=0.01,
    logging_steps=50,
    evaluation_strategy='steps',        # evaluation strategy to adopt during training
    eval_steps=500,                     # number of steps between evaluation
)
trainer = Trainer(
    model=finetuned_model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
)
trainer.train()
finetuned_model.save_pretrained("finetuned_model_2_epoch")
It can last 40 hours on my laptop with RTX 4050
Hi Bryan, your problem is probably already solved by now but anyway, from what I can see from your code you could probably benefit a lot by lowering the floating point precision to fp16, and then you would surely get the speed-up you are looking for. You might also find this helpful: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139
Best of luck!

