File size: 7,027 Bytes
232c8e9 79ba367 232c8e9 79ba367 232c8e9 79ba367 232c8e9 79ba367 232c8e9 3200499 232c8e9 79ba367 5e523df 79ba367 5e523df 79ba367 232c8e9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
language: en
tags:
- exl2
- exl3
- quantization
- requests
- community
---
<style>
.container-dark {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
line-height: 1.6;
color: #d4d4d4;
}
.card-dark {
background-color: #252526;
border-radius: 12px;
padding: 24px;
margin-bottom: 20px;
box-shadow: 0 4px 12px rgba(0,0,0,0.3);
border: 1px solid #3c3c3c;
}
.card-dark.card-dark-title h1 {
font-size: 1.5em;
color: #ffffff;
text-align: center;
margin-bottom: 10px;
}
.card-dark h1 {
font-size: 2.2em;
color: #ffffff;
text-align: center;
margin-bottom: 10px;
}
.card-dark .subtitle {
text-align: center;
font-size: 1.1em;
color: #a0a0a0;
}
.card-dark h2 {
font-size: 1.5em;
margin-top: 0;
padding-bottom: 10px;
border-bottom: 1px solid #3c3c3c;
color: #c586c0;
}
.card-dark h3 {
font-size: 1.2em;
color: #d4d4d4;
}
.btn-purple {
display: inline-block;
background-color: #6A5ACD;
color: white !important;
padding: 12px 24px;
border-radius: 8px;
text-decoration: none;
font-weight: 600;
transition: background-color 0.3s ease, transform 0.2s ease;
text-align: center;
}
.btn-purple:hover {
background-color: #7B68EE;
transform: translateY(-2px);
}
.info-box-dark {
background-color: rgba(106, 90, 205, 0.1);
border-left: 5px solid #6A5ACD;
padding: 16px;
margin: 20px 0;
border-radius: 0 8px 8px 0;
}
code.inline-code-dark {
background-color: #3a3a3a;
padding: 3px 6px;
border-radius: 4px;
font-family: 'Fira Code', 'Courier New', monospace;
color: #c586c0;
}
a {
color: #569cd6;
text-decoration: none;
font-weight: 600;
}
a:hover {
text-decoration: underline;
}
ul {
padding-left: 20px;
}
li {
margin-bottom: 8px;
}
</style>
<div class="container-dark">
<div class="card-dark card-dark-title">
<h1>EXL3 Quantization Requests</h1>
<p class="subtitle">Community hub for requesting EXL3 quants.</p>
</div>
<div class="card-dark">
<h2>How to Request a Quant</h2>
<p>To request a new model quant, please follow these steps:</p>
<ol>
<li><strong>Check Existing Quants:</strong> Before making a request, please check if an EXL3 quant already exists <a href="https://huggingface.co/models?other=exl3&sort=created" target="_blank">by exl3 tag</a> or <a href="https://huggingface.co/models?sort=created&search=exl3" target="_blank">by exl3 suffix</a>.</li>
<li><strong>Go to the Community Tab:</strong> Navigate to the Community Tab for this repository.</li>
<li><strong>Create a Model Topic:</strong> Start a new discussion with the model title. In the body, provide a direct HF link to the model you are requesting a quant for.</li>
</ol>
<div style="text-align: center; margin-top: 25px;">
<a href="https://huggingface.co/ArtusDev/requests-exl/discussions/new?title=[MODEL_NAME_HERE]&description=[MODEL_HF_LINK_HERE]" class="btn-purple" target="_blank">Request EXL3 Quant</a>
</div>
<div class="info-box-dark">
<p>Please note that not all requests can be fulfilled. The decision to quantize a model depends on available computing resources, model popularity, technical feasibility, and priority.</p>
<p>This is a personal, community-driven project. Your patience and understanding are appreciated ❤️.</p>
</div>
</div>
<div class="card-dark">
<h2>Can I Request EXL2 Quants?</h2>
<p>Being superior to EXL2 in every way (in terms of quantization quality and flexibility), EXL3 is the main target format for quantization. If you see a good reason for provisioning EXL2 quants - you can make a request with the reasoning why EXL2 should be considered for a particular model.</p>
<p>Keep in mind that among all quantization requests, EXL2 takes the lowest priority.</p>
</div>
<div class="card-dark">
<h2>About EXL3 Quantization</h2>
<p><strong>EXL3</strong> is a highly optimized quantization format based on QTIP designed for LLM inference on consumer GPUs. It is an evolution of the EXL2 format, offering higher quality within lower bitrates.</p>
<p>If you enjoy EXL quants, feel free to support <a href="https://github.com/turboderp-org/exllamav3" target="_blank"><b>EXL3 development</b></a> and a small cat working tirelessly behind it: <b>turboderp</b> (<a href="https://github.com/turboderp" target="_blank">GitHub</a>, <a href="https://ko-fi.com/turboderp" target="_blank">Ko-Fi</a>).</p>
<h3>Available Quantization Sizes</h3>
<p>To use resources optimally, quants are created in a fixed range of sizes. Custom sizes will only be considered if there is a high community demand and/or available compute.</p>
<ul>
<li><code class="inline-code-dark"><b>2.5bpw_H6</b></code></li>
<li><code class="inline-code-dark"><b>3.0bpw_H6</b></code></li>
<li><code class="inline-code-dark"><b>3.5bpw_H6</b></code></li>
<li><code class="inline-code-dark"><b>4.0bpw_H6</b></code></li>
<li><code class="inline-code-dark"><b>4.5bpw_H6</b></code> / <code class="inline-code-dark"><b>4.25bpw_H6</b></code> (for 70b and above)</li>
<li><code class="inline-code-dark"><b>5.0bpw_H6</b></code></li>
<li><code class="inline-code-dark"><b>6.0bpw_H6</b></code></li>
<li><code class="inline-code-dark"><b>8.0bpw_H8</b></code></li>
</ul>
</div>
<div class="card-dark">
<h2>How to Download and Use EXL Quants</h2>
<p>Each quantization size for a model is stored in a separate HF repository branch. You can download a specific quant size by its branch.</p>
<p>For example, to download the <code class="inline-code-dark">4.0bpw_H6</code> quant:</p>
<p><b>1. Install hugginface-cli:</b></p>
<pre><code>pip install -U "huggingface_hub[cli]"</code></pre>
<p><b>2. Download quant by targeting the specific quant size (revision):</b></p>
<pre><code>huggingface-cli download ArtusDev/MODEL_NAME --revision "4.0bpw_H6" --local-dir ./</code></pre>
<p>EXL3 quants can be run with any inference client that supports the EXL3 format, such as <a href="https://github.com/theroyallab/tabbyapi" target="_blank"><b>TabbyAPI</b></a>. Please refer to <a href="https://github.com/theroyallab/tabbyAPI/wiki/01.-Getting-Started" target="_blank">documentation</a> for set up instructions.</p>
</div>
<div class="card-dark">
<h2>Other EXL3 Quanters</h2>
<p>If you don't find the model quant you're looking for, please check these other excellent community members who also provide EXL3 quants:</p>
<ul>
<li><a href="https://huggingface.co/turboderp" target="_blank"><b>@turboderp</b></a></li>
<li><a href="https://huggingface.co/bullerwins" target="_blank"><b>@bullerwins</b></a></li>
<li><a href="https://huggingface.co/MikeRoz" target="_blank"><b>@MikeRoz</b></a></li>
</ul>
</div>
</div> |