| Before diving in, we should note | |
| that the metric applies specifically to classical language models (sometimes called autoregressive or causal language | |
| models) and is not well defined for masked language models like BERT (see summary of the models). |
| Before diving in, we should note | |
| that the metric applies specifically to classical language models (sometimes called autoregressive or causal language | |
| models) and is not well defined for masked language models like BERT (see summary of the models). |