Add pipeline_tag: feature-extraction, Code link, and Usage section (#1)
Browse files- Add pipeline_tag: feature-extraction, Code link, and Usage section (cf1733d2958d9e35886865da9ef712853bcc4302)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,21 +1,23 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 5 |
tags:
|
| 6 |
- model_hub_mixin
|
| 7 |
- pytorch_model_hub_mixin
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
-
This page houses `ARC8-Encoder_multi` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535).
|
|
|
|
| 13 |
|
| 14 |
## Models Details
|
| 15 |
|
| 16 |
All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
|
| 17 |
- `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
|
| 18 |
-
- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://
|
| 19 |
- `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
|
| 20 |
|
| 21 |
### Uses
|
|
@@ -31,6 +33,19 @@ To reproduce the results presented in the paper, you can use our released fine-t
|
|
| 31 |
|
| 32 |
Terms of use: As the released models are pretrained from Llama3.2 3B backbone, ARC-Encoders are subject to the Llama Terms of Use found at [Llama license](https://www.llama.com/license/).
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
## Citations
|
| 35 |
|
| 36 |
If you use one of these models, please cite:
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: cc-by-4.0
|
| 5 |
tags:
|
| 6 |
- model_hub_mixin
|
| 7 |
- pytorch_model_hub_mixin
|
| 8 |
+
pipeline_tag: feature-extraction
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# ARC-Encoder models
|
| 12 |
|
| 13 |
+
This page houses `ARC8-Encoder_multi` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535).
|
| 14 |
+
Code: [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder)
|
| 15 |
|
| 16 |
## Models Details
|
| 17 |
|
| 18 |
All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
|
| 19 |
- `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
|
| 20 |
+
- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://www.mistralai.com/news/announcing-mistral-7b/) base specifically with a pooling factor of 8.
|
| 21 |
- `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
|
| 22 |
|
| 23 |
### Uses
|
|
|
|
| 33 |
|
| 34 |
Terms of use: As the released models are pretrained from Llama3.2 3B backbone, ARC-Encoders are subject to the Llama Terms of Use found at [Llama license](https://www.llama.com/license/).
|
| 35 |
|
| 36 |
+
## Usage
|
| 37 |
+
|
| 38 |
+
To load the pre-trained ARC-Encoders, use the following code snippet from the [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder):
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
from embed_llm.models.augmented_model import load_and_save_released_models
|
| 42 |
+
|
| 43 |
+
# ARC8_Encoder_multi, ARC8_Encoder_Llama or ARC8_Encoder_Mistral
|
| 44 |
+
load_and_save_released_models(ARC8_Encoder_Llama, hf_token=<HF_TOKEN>)
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
***Remark:*** This code snippet loads the model from Hugging Face and then creates appropriate folders at `<TMP_PATH>` containing the checkpoint and additional necessary files for fine-tuning or evaluation with the `ARC-Encoder` codebase. To reduce occupied memory space, you can then delete the model from your Hugging Face cache.
|
| 48 |
+
|
| 49 |
## Citations
|
| 50 |
|
| 51 |
If you use one of these models, please cite:
|