| In this example, load the FacebookAI/xlm-clm-enfr-1024 checkpoint (Causal language modeling, English-French): | |
| import torch | |
| from transformers import XLMTokenizer, XLMWithLMHeadModel | |
| tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024") | |
| model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024") | |
| The lang2id attribute of the tokenizer displays this model's languages and their ids: | |
| print(tokenizer.lang2id) | |
| {'en': 0, 'fr': 1} | |
| Next, create an example input: | |
| input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1 | |
| Set the language id as "en" and use it to define the language embedding. |