ArtusDev commited on
Commit
232c8e9
·
verified ·
1 Parent(s): a0759f4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +176 -0
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - exl2
5
+ - exl3
6
+ - quantization
7
+ - requests
8
+ - community
9
+ ---
10
+ <style>
11
+ .container-dark {
12
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
13
+ line-height: 1.6;
14
+ background-color: #1e1e1e;
15
+ color: #d4d4d4;
16
+ padding: 20px;
17
+ }
18
+ .card-dark {
19
+ background-color: #252526;
20
+ border-radius: 12px;
21
+ padding: 24px;
22
+ margin-bottom: 20px;
23
+ box-shadow: 0 4px 12px rgba(0,0,0,0.3);
24
+ border: 1px solid #3c3c3c;
25
+ }
26
+ .card-dark h1 {
27
+ font-size: 2.2em;
28
+ color: #ffffff;
29
+ text-align: center;
30
+ margin-bottom: 10px;
31
+ }
32
+ .card-dark .subtitle {
33
+ text-align: center;
34
+ font-size: 1.1em;
35
+ color: #a0a0a0;
36
+ }
37
+ .card-dark h2 {
38
+ font-size: 1.5em;
39
+ margin-top: 0;
40
+ padding-bottom: 10px;
41
+ border-bottom: 1px solid #3c3c3c;
42
+ color: #c586c0;
43
+ }
44
+ .card-dark h3 {
45
+ font-size: 1.2em;
46
+ color: #d4d4d4;
47
+ }
48
+ .btn-purple {
49
+ display: inline-block;
50
+ background-color: #6A5ACD;
51
+ color: white !important;
52
+ padding: 12px 24px;
53
+ border-radius: 8px;
54
+ text-decoration: none;
55
+ font-weight: 600;
56
+ transition: background-color 0.3s ease, transform 0.2s ease;
57
+ text-align: center;
58
+ }
59
+ .btn-purple:hover {
60
+ background-color: #7B68EE;
61
+ transform: translateY(-2px);
62
+ }
63
+ .info-box-dark {
64
+ background-color: rgba(106, 90, 205, 0.1);
65
+ border-left: 5px solid #6A5ACD;
66
+ padding: 16px;
67
+ margin: 20px 0;
68
+ border-radius: 0 8px 8px 0;
69
+ }
70
+ code.inline-code-dark {
71
+ background-color: #3a3a3a;
72
+ padding: 3px 6px;
73
+ border-radius: 4px;
74
+ font-family: 'Fira Code', 'Courier New', monospace;
75
+ color: #4fc1ff;
76
+ }
77
+ .code-block-dark {
78
+ background-color: #1e1e1e;
79
+ color: #dcdcdc;
80
+ padding: 16px;
81
+ border-radius: 8px;
82
+ font-family: 'Fira Code', 'Courier New', monospace;
83
+ overflow-x: auto;
84
+ white-space: pre-wrap;
85
+ border: 1px solid #3c3c3c;
86
+ }
87
+ .code-block-dark .comment {
88
+ color: #6a9955;
89
+ }
90
+ a {
91
+ color: #569cd6;
92
+ text-decoration: none;
93
+ }
94
+ a:hover {
95
+ text-decoration: underline;
96
+ }
97
+ ul {
98
+ padding-left: 20px;
99
+ }
100
+ li {
101
+ margin-bottom: 8px;
102
+ }
103
+ </style>
104
+
105
+ <div class="container-dark">
106
+
107
+ <div class="card-dark">
108
+ <h1>EXL3 Quantization Requests</h1>
109
+ <p class="subtitle">Community hub for requesting EXL3 quants.</p>
110
+ </div>
111
+
112
+ <div class="card-dark">
113
+ <h2>How to Request a Quant</h2>
114
+ <p>To request a new model quant, please follow these steps:</p>
115
+ <ol>
116
+ <li><strong>Check Existing Quants:</strong> Before making a request, please check if an EXL3 quant already exists <a href="https://huggingface.co/models?other=exl3&sort=created" target="_blank">by exl3 tag</a> or <a href="https://huggingface.co/models?sort=created&search=exl3" target="_blank">by exl3 suffix</a>.</li>
117
+ <li><strong>Go to the Community Tab:</strong> Navigate to the Community Tab for this repository.</li>
118
+ <li><strong>Create a Model Topic:</strong> Start a new discussion with the model title. In the body, provide a direct HF link to the model you are requesting a quant for.</li>
119
+ </ol>
120
+ <div style="text-align: center; margin-top: 25px;">
121
+ <a href="https://huggingface.co/ArtusDev/requests-exl/discussions/new?title=[MODEL_NAME_HERE]&description=[MODEL_HF_LINK_HERE]" class="btn-purple" target="_blank">Request EXL3 Quant</a>
122
+ </div>
123
+ <div class="info-box-dark">
124
+ <p>Please note that not all requests can be fulfilled. The decision to quantize a model depends on available computing resources, model popularity, technical feasibility, and priority.</p>
125
+ <p>This is a personal, community-driven project. Your patience and understanding are appreciated ❤️.</p>
126
+ </div>
127
+ </div>
128
+
129
+ <div class="card-dark">
130
+ <h2>Can I Request EXL2 Quants?</h2>
131
+ <p>Being superior to EXL2 in every way (in terms of quantization quality and flexibility), EXL3 is the main target format for quantization. If you see a good reason for provisioning EXL2 quants - you can make a request with the reasoning why EXL2 should be considered for a particular model.</p>
132
+ <p>Keep in mind that among all quantization requests, EXL2 takes the lowest priority.</p>
133
+ </div>
134
+
135
+ <div class="card-dark">
136
+ <h2>About EXL3 Quantization</h2>
137
+ <p><strong>EXL3</strong> is a highly optimized quantization format based on QTIP designed for LLM inference on consumer GPUs. It is an evolution of the EXL2 format, offering higher quality within lower bitrates.</p>
138
+ <p>If you enjoy EXL quants, feel free to support <a href="https://github.com/turboderp-org/exllamav3" target="_blank"><b>EXL3 development</b></a> and a small cat working tirelessly behind it: <b>turboderp</b> (<a href="https://github.com/turboderp" target="_blank">GitHub</a>, <a href="https://ko-fi.com/turboderp" target="_blank">Ko-Fi</a>).</p>
139
+ <h3>Available Quantization Sizes</h3>
140
+ <p>To use resources optimally, quants are created in a fixed range of sizes. Custom sizes will only be considered if there is a high community demand and/or available compute.</p>
141
+ <ul>
142
+ <li><code class="inline-code-dark">2.5bpw_H6</code></li>
143
+ <li><code class="inline-code-dark">3.0bpw_H6</code></li>
144
+ <li><code class="inline-code-dark">3.5bpw_H6</code></li>
145
+ <li><code class="inline-code-dark">4.0bpw_H6</code></li>
146
+ <li><code class="inline-code-dark">4.5bpw_H6 (4.25bpw_H6 for 70b and above)</code></li>
147
+ <li><code class="inline-code-dark">5.0bpw_H6</code></li>
148
+ <li><code class="inline-code-dark">6.0bpw_H6</code></li>
149
+ <li><code class="inline-code-dark">8.0bpw_H8</code></li>
150
+ </ul>
151
+ </div>
152
+
153
+ <div class="card-dark">
154
+ <h2>How to Download and Use EXL Quants</h2>
155
+ <p>Each quantization size for a model is stored in a separate HF repository branch. You can download a specific quant size by its branch.</p>
156
+ <p>For example, to download the <code class="inline-code-dark">4.0bpw_H6</code> quant:</p>
157
+ <div class="code-block-dark">
158
+ <span class="comment"># Replace MODEL_NAME with the correct quant repository name</span>
159
+ <br>
160
+ git clone -b 4.0_6 https://huggingface.co/YOUR_USERNAME/ModelName
161
+ huggingface-cli download ArtusDev/MODEL_NAME --revision "4.0bpw_H6" --local-dir ./
162
+ </div>
163
+ <p style="margin-top: 15px;">These quants can be run with any inference client that supports the EXL3 format, such as <a href="https://github.com/theroyallab/tabbyapi" target="_blank"><b>TabbyAPI</b></a>. Please refer to <a href="https://github.com/theroyallab/tabbyAPI/wiki/01.-Getting-Started" target="_blank">documentation</a> for set up instructions.</p>
164
+ </div>
165
+
166
+ <div class="card-dark">
167
+ <h2>Other EXL3 Quanters</h2>
168
+ <p>If you don't find the model quant you're looking for, please check these other excellent community members who also provide EXL3 quants:</p>
169
+ <ul>
170
+ <li><a href="https://huggingface.co/turboderp" target="_blank"><b>@turboderp</b></a></li>
171
+ <li><a href="https://huggingface.co/bullerwins" target="_blank"><b>@bullerwins</b></a></li>
172
+ <li><a href="https://huggingface.co/MikeRoz" target="_blank"><b>@MikeRoz</b></a></li>
173
+ </ul>
174
+ </div>
175
+
176
+ </div>