File size: 7,027 Bytes
232c8e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79ba367
 
 
 
 
 
232c8e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79ba367
232c8e9
 
 
 
79ba367
232c8e9
 
 
 
 
 
 
 
 
 
 
 
 
 
79ba367
232c8e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3200499
 
 
 
 
 
 
 
232c8e9
 
 
 
 
 
 
79ba367
5e523df
79ba367
5e523df
79ba367
232c8e9
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
language: en
tags:
- exl2
- exl3
- quantization
- requests
- community
---
<style>
  .container-dark {
    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
    line-height: 1.6;
    color: #d4d4d4;
  }
  .card-dark {
    background-color: #252526;
    border-radius: 12px;
    padding: 24px;
    margin-bottom: 20px;
    box-shadow: 0 4px 12px rgba(0,0,0,0.3);
    border: 1px solid #3c3c3c;
  }
  .card-dark.card-dark-title h1 {
    font-size: 1.5em;
    color: #ffffff;
    text-align: center;
    margin-bottom: 10px;
  }
  .card-dark h1 {
    font-size: 2.2em;
    color: #ffffff;
    text-align: center;
    margin-bottom: 10px;
  }
  .card-dark .subtitle {
    text-align: center;
    font-size: 1.1em;
    color: #a0a0a0;
  }
  .card-dark h2 {
    font-size: 1.5em;
    margin-top: 0;
    padding-bottom: 10px;
    border-bottom: 1px solid #3c3c3c;
    color: #c586c0;
  }
  .card-dark h3 {
    font-size: 1.2em;
    color: #d4d4d4;
  }
  .btn-purple {
    display: inline-block;
    background-color: #6A5ACD;
    color: white !important;
    padding: 12px 24px;
    border-radius: 8px;
    text-decoration: none;
    font-weight: 600;
    transition: background-color 0.3s ease, transform 0.2s ease;
    text-align: center;
  }
  .btn-purple:hover {
    background-color: #7B68EE;
    transform: translateY(-2px);
  }
  .info-box-dark {
    background-color: rgba(106, 90, 205, 0.1);
    border-left: 5px solid #6A5ACD;
    padding: 16px;
    margin: 20px 0;
    border-radius: 0 8px 8px 0;
  }
  code.inline-code-dark {
    background-color: #3a3a3a;
    padding: 3px 6px;
    border-radius: 4px;
    font-family: 'Fira Code', 'Courier New', monospace;
    color: #c586c0;
  }
  a {
    color: #569cd6;
    text-decoration: none;
    font-weight: 600;
  }
  a:hover {
    text-decoration: underline;
  }
  ul {
    padding-left: 20px;
  }
  li {
    margin-bottom: 8px;
  }
</style>

<div class="container-dark">

  <div class="card-dark card-dark-title">
    <h1>EXL3 Quantization Requests</h1>
    <p class="subtitle">Community hub for requesting EXL3 quants.</p>
  </div>

  <div class="card-dark">
    <h2>How to Request a Quant</h2>
    <p>To request a new model quant, please follow these steps:</p>
    <ol>
      <li><strong>Check Existing Quants:</strong> Before making a request, please check if an EXL3 quant already exists <a href="https://huggingface.co/models?other=exl3&sort=created" target="_blank">by exl3 tag</a> or <a href="https://huggingface.co/models?sort=created&search=exl3" target="_blank">by exl3 suffix</a>.</li>
      <li><strong>Go to the Community Tab:</strong> Navigate to the Community Tab for this repository.</li>
      <li><strong>Create a Model Topic:</strong> Start a new discussion with the model title. In the body, provide a direct HF link to the model you are requesting a quant for.</li>
    </ol>
    <div style="text-align: center; margin-top: 25px;">
      <a href="https://huggingface.co/ArtusDev/requests-exl/discussions/new?title=[MODEL_NAME_HERE]&description=[MODEL_HF_LINK_HERE]" class="btn-purple" target="_blank">Request EXL3 Quant</a>
    </div>
    <div class="info-box-dark">
      <p>Please note that not all requests can be fulfilled. The decision to quantize a model depends on available computing resources, model popularity, technical feasibility, and priority.</p>
      <p>This is a personal, community-driven project. Your patience and understanding are appreciated ❤️.</p>
    </div>
  </div>

  <div class="card-dark">
    <h2>Can I Request EXL2 Quants?</h2>
    <p>Being superior to EXL2 in every way (in terms of quantization quality and flexibility), EXL3 is the main target format for quantization. If you see a good reason for provisioning EXL2 quants - you can make a request with the reasoning why EXL2 should be considered for a particular model.</p>
    <p>Keep in mind that among all quantization requests, EXL2 takes the lowest priority.</p>
  </div>

  <div class="card-dark">
    <h2>About EXL3 Quantization</h2>
    <p><strong>EXL3</strong> is a highly optimized quantization format based on QTIP designed for LLM inference on consumer GPUs. It is an evolution of the EXL2 format, offering higher quality within lower bitrates.</p>
    <p>If you enjoy EXL quants, feel free to support <a href="https://github.com/turboderp-org/exllamav3" target="_blank"><b>EXL3 development</b></a> and a small cat working tirelessly behind it: <b>turboderp</b> (<a href="https://github.com/turboderp" target="_blank">GitHub</a>, <a href="https://ko-fi.com/turboderp" target="_blank">Ko-Fi</a>).</p>
    <h3>Available Quantization Sizes</h3>
    <p>To use resources optimally, quants are created in a fixed range of sizes. Custom sizes will only be considered if there is a high community demand and/or available compute.</p>
    <ul>
      <li><code class="inline-code-dark"><b>2.5bpw_H6</b></code></li>
      <li><code class="inline-code-dark"><b>3.0bpw_H6</b></code></li>
      <li><code class="inline-code-dark"><b>3.5bpw_H6</b></code></li>
      <li><code class="inline-code-dark"><b>4.0bpw_H6</b></code></li>
      <li><code class="inline-code-dark"><b>4.5bpw_H6</b></code> / <code class="inline-code-dark"><b>4.25bpw_H6</b></code> (for 70b and above)</li>
      <li><code class="inline-code-dark"><b>5.0bpw_H6</b></code></li>
      <li><code class="inline-code-dark"><b>6.0bpw_H6</b></code></li>
      <li><code class="inline-code-dark"><b>8.0bpw_H8</b></code></li>
    </ul>
  </div>

  <div class="card-dark">
    <h2>How to Download and Use EXL Quants</h2>
    <p>Each quantization size for a model is stored in a separate HF repository branch. You can download a specific quant size by its branch.</p>
    <p>For example, to download the <code class="inline-code-dark">4.0bpw_H6</code> quant:</p>
    <p><b>1. Install hugginface-cli:</b></p>
    <pre><code>pip install -U "huggingface_hub[cli]"</code></pre>
    <p><b>2. Download quant by targeting the specific quant size (revision):</b></p>
    <pre><code>huggingface-cli download ArtusDev/MODEL_NAME --revision "4.0bpw_H6" --local-dir ./</code></pre>
    <p>EXL3 quants can be run with any inference client that supports the EXL3 format, such as <a href="https://github.com/theroyallab/tabbyapi" target="_blank"><b>TabbyAPI</b></a>. Please refer to <a href="https://github.com/theroyallab/tabbyAPI/wiki/01.-Getting-Started" target="_blank">documentation</a> for set up instructions.</p>
  </div>
  
  <div class="card-dark">
    <h2>Other EXL3 Quanters</h2>
    <p>If you don't find the model quant you're looking for, please check these other excellent community members who also provide EXL3 quants:</p>
    <ul>
      <li><a href="https://huggingface.co/turboderp" target="_blank"><b>@turboderp</b></a></li>
      <li><a href="https://huggingface.co/bullerwins" target="_blank"><b>@bullerwins</b></a></li>
      <li><a href="https://huggingface.co/MikeRoz" target="_blank"><b>@MikeRoz</b></a></li>
    </ul>
  </div>

</div>