cccczshao nielsr HF Staff commited on
Commit
ef36929
·
verified ·
1 Parent(s): f0cc088

Update model card: Add pipeline tag (#1)

Browse files

- Update model card: Add pipeline tag (7cda26f29b25184410e39d65259cdb1c32ef9f88)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +14 -13
README.md CHANGED
@@ -1,15 +1,16 @@
1
  ---
2
- license: mit
3
  datasets:
4
  - monology/pile-uncopyrighted
5
  language:
6
  - en
7
  library_name: CALM
 
 
 
8
  tags:
9
  - large language models
10
  - language modeling
11
- metrics:
12
- - BrierLM
13
  ---
14
 
15
  # Continuous Autoregressive Language Models
@@ -25,19 +26,19 @@ Modern Large Language Models (LLMs) are constrained by a fundamental bottleneck:
25
 
26
  This is achieved through a two-stage process:
27
 
28
- 1. **A high-fidelity autoencoder** learns to compress K tokens into a single vector and reconstruct them with near-perfect accuracy.
29
- 2. **A continuous-domain language model** then performs autoregressive prediction in this vector space.
30
 
31
  ### Key Features
32
 
33
- * 🚀 **Ultra-Efficient by Design:** Dramatically improves training and inference efficiency by reducing the number of autoregressive steps by a factor of K.
34
- * 💡 **A New Scaling Axis:** Introduces a new scaling dimension for LLMs—semantic bandwidth (K). Instead of just scaling parameters and data, you can now scale the amount of information processed in a single step.
35
- * 🛠️ **A Comprehensive Likelihood-Free Toolkit:** Operating in a continuous domain requires new tools. This repository provides the full suite of algorithms that make CALM possible:
36
 
37
- * **A Robust Autoencoder** to learn high-fidelity continuous representations of token chunks.
38
- * **Energy-Based Training**, a principled and likelihood-free method for generative modeling.
39
- * **BrierLM**, a new metric for calibrated, likelihood-free evaluation of language models.
40
- * **Temperature Sampling** for controlled, high-quality text generation using only a black-box sampler.
41
 
42
  ## How to use
43
 
@@ -45,4 +46,4 @@ See our [GitHub README](https://github.com/shaochenze/calm), where we provide sc
45
 
46
  ## Contact
47
 
48
- If you have any questions, feel free to submit an issue or contact `[email protected]`.
 
1
  ---
 
2
  datasets:
3
  - monology/pile-uncopyrighted
4
  language:
5
  - en
6
  library_name: CALM
7
+ license: mit
8
+ metrics:
9
+ - BrierLM
10
  tags:
11
  - large language models
12
  - language modeling
13
+ pipeline_tag: text-generation
 
14
  ---
15
 
16
  # Continuous Autoregressive Language Models
 
26
 
27
  This is achieved through a two-stage process:
28
 
29
+ 1. **A high-fidelity autoencoder** learns to compress K tokens into a single vector and reconstruct them with near-perfect accuracy.
30
+ 2. **A continuous-domain language model** then performs autoregressive prediction in this vector space.
31
 
32
  ### Key Features
33
 
34
+ * 🚀 **Ultra-Efficient by Design:** Dramatically improves training and inference efficiency by reducing the number of autoregressive steps by a factor of K.
35
+ * 💡 **A New Scaling Axis:** Introduces a new scaling dimension for LLMs—semantic bandwidth (K). Instead of just scaling parameters and data, you can now scale the amount of information processed in a single step.
36
+ * 🛠️ **A Comprehensive Likelihood-Free Toolkit:** Operating in a continuous domain requires new tools. This repository provides the full suite of algorithms that make CALM possible:
37
 
38
+ * **A Robust Autoencoder** to learn high-fidelity continuous representations of token chunks.
39
+ * **Energy-Based Training**, a principled and likelihood-free method for generative modeling.
40
+ * **BrierLM**, a new metric for calibrated, likelihood-free evaluation of language models.
41
+ * **Temperature Sampling** for controlled, high-quality text generation using only a black-box sampler.
42
 
43
  ## How to use
44
 
 
46
 
47
  ## Contact
48
 
49
+ If you have any questions, feel free to submit an issue or contact `[email protected]`.