Commit
·
020cdb5
1
Parent(s):
230bf62
Push model using huggingface_hub.
Browse files- README.md +6 -38
- config.json +2 -38
- model.safetensors +3 -0
README.md
CHANGED
|
@@ -1,41 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
-
|
| 5 |
---
|
| 6 |
|
| 7 |
-
#
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
## Models Details
|
| 12 |
-
|
| 13 |
-
All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
|
| 14 |
-
- `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
|
| 15 |
-
- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file) base specifically with a pooling factor of 8.
|
| 16 |
-
- `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
|
| 17 |
-
|
| 18 |
-
### Uses
|
| 19 |
-
|
| 20 |
-
As described in the [paper](https://github.com/kyutai-labs/ARC-Encoder/blob/main/ARC_Encoder_preprint.pdf), the pretrained ARC-Encoders can be fine-tuned to perform various downstream tasks.
|
| 21 |
-
You can also adapt an ARC-Encoder to a new pooling factor (PF) by fine-tuning it on the desired PF.
|
| 22 |
-
For optimal results, we recommend fine-tuning toward a lower PF than the one used during pretraining.
|
| 23 |
-
To reproduce the results presented in the paper, you can use our released fine-tuning dataset, [ARC_finetuning](https://huggingface.co/datasets/kyutai/ARC_finetuning).
|
| 24 |
-
|
| 25 |
-
### Licensing
|
| 26 |
-
|
| 27 |
-
ARC-Encoders are licensed under the CC-BY 4.0 license.
|
| 28 |
-
|
| 29 |
-
Terms of use: As the released models are pretrained from Llama3.2 3B backbone, ARC-Encoders are subject to the Llama Terms of Use found at [Llama license](https://www.llama.com/license/).
|
| 30 |
-
|
| 31 |
-
## Citations
|
| 32 |
-
|
| 33 |
-
If you use one of these models, please cite:
|
| 34 |
-
|
| 35 |
-
```bibtex
|
| 36 |
-
@techreport{pilchen2025arc_encoder,
|
| 37 |
-
title={ARC-Encoder: learning compressed text representations for large language models},
|
| 38 |
-
author={Pilchen, Hippolyte and Grave, Edouard and P{\'e}rez, Patrick},
|
| 39 |
-
year={2025}
|
| 40 |
-
}
|
| 41 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
tags:
|
| 3 |
+
- model_hub_mixin
|
| 4 |
+
- pytorch_model_hub_mixin
|
| 5 |
---
|
| 6 |
|
| 7 |
+
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
|
| 8 |
+
- Library: [More Information Needed]
|
| 9 |
+
- Docs: [More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
|
@@ -1,43 +1,7 @@
|
|
| 1 |
{
|
| 2 |
-
"bridge_args": {
|
| 3 |
-
"bridge_type": "multi_module",
|
| 4 |
-
"hidden_dim": 2048,
|
| 5 |
-
"in_dim": 3072,
|
| 6 |
-
"out_dim": 4096
|
| 7 |
-
},
|
| 8 |
"embedder": null,
|
| 9 |
-
"embedder_args":
|
| 10 |
-
"causal_embedder": false,
|
| 11 |
-
"compress_rates": [
|
| 12 |
-
-8
|
| 13 |
-
],
|
| 14 |
-
"cont_tok": true,
|
| 15 |
-
"memory_tokens": 0,
|
| 16 |
-
"n_truncated_layers": 2,
|
| 17 |
-
"pooling_module": {
|
| 18 |
-
"pool_type": "mean_pooled_queries",
|
| 19 |
-
"where": "before"
|
| 20 |
-
},
|
| 21 |
-
"rec_tok": true,
|
| 22 |
-
"train_embedding_mtx": true,
|
| 23 |
-
"trained_layers": 27
|
| 24 |
-
},
|
| 25 |
"empty_init": 2,
|
| 26 |
"llms": [],
|
| 27 |
-
"model_args":
|
| 28 |
-
"_sliding_window": null,
|
| 29 |
-
"dim": 3072,
|
| 30 |
-
"head_dim": 128,
|
| 31 |
-
"hidden_dim": 8192,
|
| 32 |
-
"max_batch_size": 1,
|
| 33 |
-
"model_type": "transformer",
|
| 34 |
-
"n_heads": 24,
|
| 35 |
-
"n_kv_heads": 8,
|
| 36 |
-
"n_layers": 28,
|
| 37 |
-
"non_parametric_norm": false,
|
| 38 |
-
"norm_eps": "1e-05",
|
| 39 |
-
"rope_theta": 500000.0,
|
| 40 |
-
"sliding_window": null,
|
| 41 |
-
"vocab_size": 128256
|
| 42 |
-
}
|
| 43 |
}
|
|
|
|
| 1 |
{
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
"embedder": null,
|
| 3 |
+
"embedder_args": null,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
"empty_init": 2,
|
| 5 |
"llms": [],
|
| 6 |
+
"model_args": null
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1e8fbebeb7d71d087c70d74c7835fe82abdbe0327fb0f5d7fa3ef825ee82322c
|
| 3 |
+
size 12163149040
|