Add pipeline_tag: feature-extraction, Code link, and Usage section (#1)

Browse files

- Add pipeline_tag: feature-extraction, Code link, and Usage section (cf1733d2958d9e35886865da9ef712853bcc4302)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +19 -4

README.md CHANGED Viewed

@@ -1,21 +1,23 @@
 ---
-license: cc-by-4.0
 language:
 - en
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
- # ARC-Encoder models
- This page houses `ARC8-Encoder_multi` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535). A code to reproduce the pretraining, further fine-tune the encoders or even evaluate them on dowstream tasks is available at [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder/tree/main).
  ## Models Details
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
 - `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
-- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file) base specifically with a pooling factor of 8.
 - `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
  ### Uses
@@ -31,6 +33,19 @@ To reproduce the results presented in the paper, you can use our released fine-t
 Terms of use: As the released models are pretrained from Llama3.2 3B backbone, ARC-Encoders are subject to the Llama Terms of Use found at [Llama license](https://www.llama.com/license/).
  ## Citations
  If you use one of these models, please cite:

 ---
 language:
 - en
+license: cc-by-4.0
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
+pipeline_tag: feature-extraction
 ---
+# ARC-Encoder models
+ This page houses `ARC8-Encoder_multi` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535).
+Code: [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder)
  ## Models Details
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
 - `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
+- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://www.mistralai.com/news/announcing-mistral-7b/) base specifically with a pooling factor of 8.
 - `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
  ### Uses
 Terms of use: As the released models are pretrained from Llama3.2 3B backbone, ARC-Encoders are subject to the Llama Terms of Use found at [Llama license](https://www.llama.com/license/).
+ ## Usage
+ To load the pre-trained ARC-Encoders, use the following code snippet from the [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder):
+ ```python
+from embed_llm.models.augmented_model import load_and_save_released_models
+# ARC8_Encoder_multi, ARC8_Encoder_Llama or ARC8_Encoder_Mistral
+load_and_save_released_models(ARC8_Encoder_Llama, hf_token=<HF_TOKEN>)
+```
+ ***Remark:*** This code snippet loads the model from Hugging Face and then creates appropriate folders at `<TMP_PATH>` containing the checkpoint and additional necessary files for fine-tuning or evaluation with the `ARC-Encoder` codebase. To reduce occupied memory space, you can then delete the model from your Hugging Face cache.
  ## Citations
  If you use one of these models, please cite: