Accelerate documentation
Megatron-LM utilities
Megatron-LM utilities
MegatronLMPlugin
class accelerate.utils.MegatronLMPlugin
< source >( tp_degree: int = None pp_degree: int = None num_micro_batches: int = None gradient_clipping: float = None sequence_parallelism: bool = None recompute_activations: bool = None use_distributed_optimizer: bool = None pipeline_model_parallel_split_rank: int = None num_layers_per_virtual_pipeline_stage: int = None is_train_batch_min: str = True train_iters: int = None train_samples: int = None weight_decay_incr_style: str = 'constant' start_weight_decay: float = None end_weight_decay: float = None lr_decay_style: str = 'linear' lr_decay_iters: int = None lr_decay_samples: int = None lr_warmup_iters: int = None lr_warmup_samples: int = None lr_warmup_fraction: float = None min_lr: float = 0 consumed_samples: list = None no_wd_decay_cond: typing.Optional[typing.Callable] = None scale_lr_cond: typing.Optional[typing.Callable] = None lr_mult: float = 1.0 megatron_dataset_flag: bool = False seq_length: int = None encoder_seq_length: int = None decoder_seq_length: int = None tensorboard_dir: str = None set_all_logging_options: bool = False eval_iters: int = 100 eval_interval: int = 1000 return_logits: bool = False custom_train_step_class: typing.Optional[typing.Any] = None custom_train_step_kwargs: typing.Optional[dict[str, typing.Any]] = None custom_model_provider_function: typing.Optional[typing.Callable] = None custom_prepare_model_function: typing.Optional[typing.Callable] = None custom_megatron_datasets_provider_function: typing.Optional[typing.Callable] = None custom_get_batch_function: typing.Optional[typing.Callable] = None custom_loss_function: typing.Optional[typing.Callable] = None other_megatron_args: typing.Optional[dict[str, typing.Any]] = None )
Parameters
-  tp_degree (
int, defaults toNone) — Tensor parallelism degree. -  pp_degree (
int, defaults toNone) — Pipeline parallelism degree. -  num_micro_batches (
int, defaults toNone) — Number of micro-batches. -  gradient_clipping (
float, defaults toNone) — Gradient clipping value based on global L2 Norm (0 to disable). -  sequence_parallelism (
bool, defaults toNone) — Enable sequence parallelism. -  recompute_activations (
bool, defaults toNone) — Enable selective activation recomputation. -  use_distributed_optimizr (
bool, defaults toNone) — Enable distributed optimizer. -  pipeline_model_parallel_split_rank (
int, defaults toNone) — Rank where encoder and decoder should be split. -  num_layers_per_virtual_pipeline_stage (
int, defaults toNone) — Number of layers per virtual pipeline stage. -  is_train_batch_min (
str, defaults toTrue) — If both tran & eval dataloaders are specified, this will decide themicro_batch_size. -  train_iters (
int, defaults toNone) — Total number of samples to train over all training runs. Note that either train-iters or train-samples should be provided when usingMegatronLMDummyScheduler. -  train_samples (
int, defaults toNone) — Total number of samples to train over all training runs. Note that either train-iters or train-samples should be provided when usingMegatronLMDummyScheduler. -  weight_decay_incr_style (
str, defaults to'constant') — Weight decay increment function. choices=[“constant”, “linear”, “cosine”]. -  start_weight_decay (
float, defaults toNone) — Initial weight decay coefficient for L2 regularization. -  end_weight_decay (
float, defaults toNone) — End of run weight decay coefficient for L2 regularization. -  lr_decay_style (
str, defaults to'linear') — Learning rate decay function. choices=[‘constant’, ‘linear’, ‘cosine’]. -  lr_decay_iters (
int, defaults toNone) — Number of iterations for learning rate decay. If None defaults totrain_iters. -  lr_decay_samples (
int, defaults toNone) — Number of samples for learning rate decay. If None defaults totrain_samples. -  lr_warmup_iters (
int, defaults toNone) — Number of iterations to linearly warmup learning rate over. -  lr_warmup_samples (
int, defaults toNone) — Number of samples to linearly warmup learning rate over. -  lr_warmup_fraction (
float, defaults toNone) — Fraction of lr-warmup-(iters/samples) to linearly warmup learning rate over. -  min_lr (
float, defaults to0) — Minimum value for learning rate. The scheduler clip values below this threshold. -  consumed_samples (
List, defaults toNone) — Number of samples consumed in the same order as the dataloaders toaccelerator.preparecall. -  no_wd_decay_cond (
Optional, defaults toNone) — Condition to disable weight decay. -  scale_lr_cond (
Optional, defaults toNone) — Condition to scale learning rate. -  lr_mult (
float, defaults to1.0) — Learning rate multiplier. -  megatron_dataset_flag (
bool, defaults toFalse) — Whether the format of dataset follows Megatron-LM Indexed/Cached/MemoryMapped format. -  seq_length (
int, defaults toNone) — Maximum sequence length to process. -  encoder_seq_length (
int, defaults toNone) — Maximum sequence length to process for the encoder. -  decoder_seq_length (
int, defaults toNone) — Maximum sequence length to process for the decoder. -  tensorboard_dir (
str, defaults toNone) — Path to save tensorboard logs. -  set_all_logging_options (
bool, defaults toFalse) — Whether to set all logging options. -  eval_iters (
int, defaults to100) — Number of iterations to run for evaluation validation/test for. -  eval_interval (
int, defaults to1000) — Interval between running evaluation on validation set. -  return_logits (
bool, defaults toFalse) — Whether to return logits from the model. -  custom_train_step_class (
Optional, defaults toNone) — Custom train step class. -  custom_train_step_kwargs (
Optional, defaults toNone) — Custom train step kwargs. -  custom_model_provider_function (
Optional, defaults toNone) — Custom model provider function. -  custom_prepare_model_function (
Optional, defaults toNone) — Custom prepare model function. -  custom_megatron_datasets_provider_function (
Optional, defaults toNone) — Custom megatron train_valid_test datasets provider function. -  custom_get_batch_function (
Optional, defaults toNone) — Custom get batch function. -  custom_loss_function (
Optional, defaults toNone) — Custom loss function. -  other_megatron_args (
Optional, defaults toNone) — Other Megatron-LM arguments. Please refer Megatron-LM. 
Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.
MegatronLMDummyScheduler
class accelerate.utils.MegatronLMDummyScheduler
< source >( optimizer total_num_steps = None warmup_num_steps = 0 **kwargs )
Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.
MegatronLMDummyDataLoader
class accelerate.utils.MegatronLMDummyDataLoader
< source >( **dataset_kwargs )
Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training
AbstractTrainStep
Abstract class for batching, forward pass and loss handler.
GPTTrainStep
class accelerate.utils.GPTTrainStep
< source >( accelerator args )
GPT train step class.
BertTrainStep
class accelerate.utils.BertTrainStep
< source >( accelerator args )
Bert train step class.
T5TrainStep
class accelerate.utils.T5TrainStep
< source >( accelerator args )
T5 train step class.
avg_losses_across_data_parallel_group
accelerate.utils.avg_losses_across_data_parallel_group
< source >( losses )
Average losses across data parallel group.