martingenzel commited on
Commit
23d5d3b
Β·
verified Β·
1 Parent(s): fbce8d0

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -104
README.md CHANGED
@@ -1,104 +1,89 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - allenai/c4
5
- language:
6
- - zho
7
- - eng
8
- - fra
9
- - spa
10
- - por
11
- - deu
12
- - ita
13
- - rus
14
- - jpn
15
- - kor
16
- - vie
17
- - tha
18
- - ara
19
- metrics:
20
- - perplexity
21
- - accuracy
22
- base_model:
23
- - Qwen/Qwen2.5-7B
24
- pipeline_tag: text-generation
25
- library_name: transformers
26
- ---
27
- <div align="center">
28
- <img width="30%" alt="logo" src="https://imgur.com/A0MCHPq.png">
29
- </div>
30
-
31
- <div align="center">
32
- <a href="https://github.com/merantix-momentum/acip"><img src="https://img.shields.io/badge/GitHub-%23121011.svg?logo=github&logoColor=white.svg" alt="github" style="display: inline-block; vertical-align: middle;"></a>
33
- <a href="https://arxiv.org/abs/2502.01717"><img src="https://img.shields.io/badge/arXiv-2502.01717-b31b1b.svg" alt="arxiv" style="display: inline-block; vertical-align: middle;"></a>
34
- <a href="https://acip.merantix-momentum.com/"><img alt="website" src="https://img.shields.io/website/https/acip.merantix-momentum.com.svg?down_color=red&down_message=offline&up_message=online" style="display: inline-block; vertical-align: middle;"></a>
35
- </div>
36
-
37
- <h2 align="center">
38
- <p> [
39
- <a href="https://github.com/merantix-momentum/acip">πŸ€– GitHub</a> |
40
- <a href="https://arxiv.org/abs/2502.01717">πŸ“„ Paper</a> |
41
- <a href="https://acip.merantix-momentum.com/">🌐 Website</a>
42
- ]
43
- </p>
44
- </h2>
45
-
46
- <h1 align="center">
47
- <p>ACIP applied to Qwen/Qwen2.5-7B</p>
48
- </h1>
49
-
50
- This model repository is part of the ACIP Project and provides a compressible version of [`Qwen/Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B). For more details, please visit our [code repo](https://github.com/merantix-momentum/acip).
51
-
52
- # Quick Start
53
-
54
- Just load the ACIP model via `from_pretrained`:
55
- ```python
56
- from transformers import AutoModel
57
-
58
- model = AutoModel.from_pretrained("MerantixMomentum/acip_qwen25_7b", trust_remote_code=True)
59
- ```
60
- This will download and create a fully parameterized ACIP model that can be pruned to any compression ratio you wish.
61
- For example,
62
- ```python
63
- model.prune_model_by_score(compression_ratio=0.4)
64
- ```
65
- will prune `model` to 40% if its original size measured in number of parameters, i.e., 60% compression rate.
66
- A unique feature of ACIP is that this operation is revertible in the sense that you can rerun `model.prune_model_by_score` as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run
67
- ```python
68
- model.compress()
69
- ```
70
- which will discard all pruned mask values of compressible linear layers.
71
- Now the model is actually compressed and you should observe a significant decrease of memory usage (this step is not revertible without reloading the ACIP model).
72
- If you like, you can also run
73
- ```python
74
- model.quantize()
75
- ```
76
- to save even more memory (we have only tested 4bit quantization with `bitsandbytes`, but you could also customize this).
77
-
78
- **πŸš€ That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from πŸ€— transformers.**
79
-
80
- **Note**: The parameter `compression_ratio` ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters and 1.0 means no compression at all.
81
-
82
- # Dependencies
83
-
84
- To run an ACIP model from our hub, you only need minimal dependencies, namely `torch`, `transformers`, `peft`, and optionally, `bitsandbytes` in case you want to quantize your model.
85
- See [requirements.txt](requirements.txt) for pip-installable dependencies with exact version pins (newer version should work as well).
86
-
87
- # License
88
-
89
- This model is released under the apache-2.0 license.
90
-
91
- # Citation
92
-
93
- When using or referring to this model, please cite our [paper](https://arxiv.org/abs/2502.01717):
94
- ```bibtex
95
- @article{mxm2025acip,
96
- title={Choose Your Model Size: Any Compression by a Single Gradient Descent},
97
- author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
98
- year={2025},
99
- journal={Preprint arXiv:2502.01717}
100
- }
101
- ```
102
-
103
-
104
-
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets: ['allenai/c4']
4
+ language: ['zho', 'eng', 'fra', 'spa', 'por', 'deu', 'ita', 'rus', 'jpn', 'kor', 'vie', 'tha', 'ara']
5
+ metrics: ['perplexity', 'accuracy']
6
+ tags: ['acip', 'pytorch']
7
+ base_model:
8
+ - Qwen/Qwen2.5-7B
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ ---
12
+ <div align="center">
13
+ <img width="30%" alt="logo" src="https://imgur.com/A0MCHPq.png">
14
+ </div>
15
+
16
+ <div align="center">
17
+ <a href="https://github.com/merantix-momentum/acip"><img src="https://img.shields.io/badge/GitHub-%23121011.svg?logo=github&logoColor=white.svg" alt="github" style="display: inline-block; vertical-align: middle;"></a>
18
+ <a href="https://arxiv.org/abs/2502.01717"><img src="https://img.shields.io/badge/arXiv-2502.01717-b31b1b.svg" alt="arxiv" style="display: inline-block; vertical-align: middle;"></a>
19
+ <a href="https://acip.merantix-momentum.com/"><img alt="website" src="https://img.shields.io/website/https/acip.merantix-momentum.com.svg?down_color=red&down_message=offline&up_message=online" style="display: inline-block; vertical-align: middle;"></a>
20
+ </div>
21
+
22
+ <h2 align="center">
23
+ <p> [
24
+ <a href="https://github.com/merantix-momentum/acip">πŸ€– GitHub</a> |
25
+ <a href="https://arxiv.org/abs/2502.01717">πŸ“„ Paper</a> |
26
+ <a href="https://acip.merantix-momentum.com/">🌐 Website</a>
27
+ ]
28
+ </p>
29
+ </h2>
30
+
31
+ <h1 align="center">
32
+ <p>ACIP applied to Qwen/Qwen2.5-7B</p>
33
+ </h1>
34
+
35
+ This model repository is part of the ACIP Project and provides a compressible version of [`Qwen/Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B). For more details, please visit our [code repo](https://github.com/merantix-momentum/acip).
36
+
37
+ # Quick Start
38
+
39
+ Just load the ACIP model via `from_pretrained`:
40
+ ```python
41
+ from transformers import AutoModel
42
+
43
+ model = AutoModel.from_pretrained("MerantixMomentum/acip_qwen25_7b", trust_remote_code=True)
44
+ ```
45
+ This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you wish.
46
+ For example,
47
+ ```python
48
+ model.prune_model_by_score(size_ratio=0.4)
49
+ ```
50
+ will prune `model` to 40% if its original size measured in number of parameters, i.e., 60% compression rate.
51
+ A unique feature of ACIP is that this operation is revertible in the sense that you can rerun `model.prune_model_by_score` as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run
52
+ ```python
53
+ model.compress()
54
+ ```
55
+ which will discard all pruned mask values of compressible linear layers.
56
+ Now the model is actually compressed and you should observe a significant decrease of memory usage (this step is not revertible without reloading the ACIP model).
57
+ If you like, you can also run
58
+ ```python
59
+ model.quantize()
60
+ ```
61
+ to save even more memory (we have only tested 4bit quantization with `bitsandbytes`, but you could also customize this).
62
+
63
+ **πŸš€ That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from πŸ€— transformers.**
64
+
65
+ **Note**: The parameter `size_ratio` ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters and 1.0 means no compression at all. Alternatively, you can also set `compression_rate` in `prune_model_by_score`, which is equivalent to `size_ratio = 1.0 - compression_rate`.
66
+
67
+ # Dependencies
68
+
69
+ To run an ACIP model from our hub, you only need minimal dependencies, namely `torch`, `transformers`, `peft`, and optionally, `bitsandbytes` in case you want to quantize your model.
70
+ See [requirements.txt](requirements.txt) for pip-installable dependencies with exact version pins (newer version should work as well).
71
+
72
+ # License
73
+
74
+ This model is released under the apache-2.0 license.
75
+
76
+ # Citation
77
+
78
+ When using or referring to this model, please cite our [paper](https://arxiv.org/abs/2502.01717):
79
+ ```bibtex
80
+ @article{mxm2025acip,
81
+ title={Choose Your Model Size: Any Compression by a Single Gradient Descent},
82
+ author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
83
+ year={2025},
84
+ journal={Preprint arXiv:2502.01717}
85
+ }
86
+ ```
87
+
88
+
89
+