Spaces:
				
			
			
	
			
			
		Running
		
			on 
			
			Zero
	
	
	
			
			
	
	
	
	
		
		I have some issue with getting the write input shape for fine tuning whisper AI
I never changed my input shape, it matches the common_voice from whisper AI fine tuning blog (https://huggingface.co/blog/fine-tune-whisper) .
Also the mapping work fine and seems just like what it is for Common_voices, but when run the trainer() it shows that there is dimension mismatch !
                                                      Code 
from transformers import Seq2SeqTrainingArguments
โ
training_args = Seq2SeqTrainingArguments(
    output_dir="openai/whisper-small-ar",  # change to a repo name of your choice
    per_device_train_batch_size=20,
    gradient_accumulation_steps=1,  # increase by 2x for every 2x decrease in batch size
    learning_rate=1e-5,
    warmup_steps=10,
    max_steps=10,
    gradient_checkpointing=True,
    fp16=False,
    evaluation_strategy="steps",
    per_device_eval_batch_size=2,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=1000,
    eval_steps=1000,
    logging_steps=25,
    report_to=["tensorboard"],
    load_best_model_at_end=True,
    metric_for_best_model="wer",
    #greater_is_better=False,
    #push_to_hub=True,
    dataloader_drop_last=True,
)
Note: if one does not want to upload the model checkpoints to the Hub, set push_to_hub=False.
We can forward the training arguments to the ๐ค Trainer along with our model, dataset, data collator and compute_metrics function:
model
from transformers import Seq2SeqTrainer
โ
trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=ar_text["train"],
    eval_dataset=ar_text["test"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    tokenizer=processor.feature_extractor,
)
len(ar_text["train"])
len(ar_text["test"])
150
ar_text
DatasetDict({
    train: Dataset({
        features: ['input_features', 'labels'],
        num_rows: 216
    })
    test: Dataset({
        features: ['input_features', 'labels'],
        num_rows: 150
    })
})
We'll save the processor object once before starting training. Since the processor is not trainable, it won't change over the course of training:
processor.save_pretrained(training_args.output_dir)
Training
Training will take approximately 5-10 hours depending on your GPU or the one allocated to this Google Colab. If using this Google Colab directly to fine-tune a Whisper model, you should make sure that training isn't interrupted due to inactivity. A simple workaround to prevent this is to paste the following code into the console of this tab (right mouse click -> inspect -> Console tab -> insert code).
function ConnectButton(){
    console.log("Connect pushed");
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click()
}
setInterval(ConnectButton, 60000);
The peak GPU memory for the given training configuration is approximately 15.8GB. Depending on the GPU allocated to the Google Colab, it is possible that you will encounter a CUDA "out-of-memory" error when you launch training. In this case, you can reduce the per_device_train_batch_size incrementally by factors of 2 and employ gradient_accumulation_steps to compensate.
To launch training, simply execute:
trainer.train()
                                                      ERORR 
RuntimeError                              Traceback (most recent call last)
Cell In[360], line 1
----> 1 trainer.train()
File ~\anaconda3\Lib\site-packages\transformers\trainer.py:1534, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1529     self.model_wrapped = self.model
   1531 inner_training_loop = find_executable_batch_size(
   1532     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1533 )
-> 1534 return inner_training_loop(
   1535     args=args,
   1536     resume_from_checkpoint=resume_from_checkpoint,
   1537     trial=trial,
   1538     ignore_keys_for_eval=ignore_keys_for_eval,
   1539 )
File ~\anaconda3\Lib\site-packages\transformers\trainer.py:1807, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1804     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   1806 with self.accelerator.accumulate(model):
-> 1807     tr_loss_step = self.training_step(model, inputs)
   1809 if (
   1810     args.logging_nan_inf_filter
   1811     and not is_torch_tpu_available()
   1812     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   1813 ):
   1814     # if loss is nan or inf simply add the average of previous logged losses
   1815     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File ~\anaconda3\Lib\site-packages\transformers\trainer.py:2649, in Trainer.training_step(self, model, inputs)
   2646     return loss_mb.reduce_mean().detach().to(self.args.device)
   2648 with self.compute_loss_context_manager():
-> 2649     loss = self.compute_loss(model, inputs)
   2651 if self.args.n_gpu > 1:
   2652     loss = loss.mean()  # mean() to average on multi-gpu parallel training
File ~\anaconda3\Lib\site-packages\transformers\trainer.py:2674, in Trainer.compute_loss(self, model, inputs, return_outputs)
   2672 else:
   2673     labels = None
-> 2674 outputs = model(**inputs)
   2675 # Save past state if it exists
   2676 # TODO: this needs to be fixed and made cleaner later.
   2677 if self.args.past_index >= 0:
File ~\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\Lib\site-packages\transformers\models\whisper\modeling_whisper.py:1490, in WhisperForConditionalGeneration.forward(self, input_features, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1485     if decoder_input_ids is None and decoder_inputs_embeds is None:
   1486         decoder_input_ids = shift_tokens_right(
   1487             labels, self.config.pad_token_id, self.config.decoder_start_token_id
   1488         )
-> 1490 outputs = self.model(
   1491     input_features,
   1492     attention_mask=attention_mask,
   1493     decoder_input_ids=decoder_input_ids,
   1494     encoder_outputs=encoder_outputs,
   1495     decoder_attention_mask=decoder_attention_mask,
   1496     head_mask=head_mask,
   1497     decoder_head_mask=decoder_head_mask,
   1498     cross_attn_head_mask=cross_attn_head_mask,
   1499     past_key_values=past_key_values,
   1500     decoder_inputs_embeds=decoder_inputs_embeds,
   1501     use_cache=use_cache,
   1502     output_attentions=output_attentions,
   1503     output_hidden_states=output_hidden_states,
   1504     return_dict=return_dict,
   1505 )
   1506 lm_logits = self.proj_out(outputs[0])
   1508 loss = None
File ~\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\Lib\site-packages\transformers\models\whisper\modeling_whisper.py:1346, in WhisperModel.forward(self, input_features, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, decoder_inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
   1343 if encoder_outputs is None:
   1344     input_features = self._mask_input_features(input_features, attention_mask=attention_mask)
-> 1346     encoder_outputs = self.encoder(
   1347         input_features,
   1348         head_mask=head_mask,
   1349         output_attentions=output_attentions,
   1350         output_hidden_states=output_hidden_states,
   1351         return_dict=return_dict,
   1352     )
   1353 # If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOutput when return_dict=True
   1354 elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
File ~\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\Lib\site-packages\transformers\models\whisper\modeling_whisper.py:896, in WhisperEncoder.forward(self, input_features, attention_mask, head_mask, output_attentions, output_hidden_states, return_dict)
    892 output_hidden_states = (
    893     output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    894 )
    895 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
--> 896 inputs_embeds = nn.functional.gelu(self.conv1(input_features))
    897 inputs_embeds = nn.functional.gelu(self.conv2(inputs_embeds))
    899 inputs_embeds = inputs_embeds.permute(0, 2, 1)
File ~\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\Lib\site-packages\torch\nn\modules\conv.py:313, in Conv1d.forward(self, input)
    312 def forward(self, input: Tensor) -> Tensor:
--> 313     return self._conv_forward(input, self.weight, self.bias)
File ~\anaconda3\Lib\site-packages\torch\nn\modules\conv.py:309, in Conv1d._conv_forward(self, input, weight, bias)
    305 if self.padding_mode != 'zeros':
    306     return F.conv1d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    307                     weight, bias, self.stride,
    308                     _single(0), self.dilation, self.groups)
--> 309 return F.conv1d(input, weight, bias, self.stride,
    310                 self.padding, self.dilation, self.groups)
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [20, 1, 80, 3000]
