EXL3 Quantization Requests

Community hub for requesting EXL3 quants.

How to Request a Quant

To request a new model quant, please follow these steps:

Check Existing Quants: Before making a request, please check if an EXL3 quant already exists by exl3 tag or by exl3 suffix.
Go to the Community Tab: Navigate to the Community Tab for this repository.
Create a Model Topic: Start a new discussion with the model title. In the body, provide a direct HF link to the model you are requesting a quant for.

Please note that not all requests can be fulfilled. The decision to quantize a model depends on available computing resources, model popularity, technical feasibility, and priority.

This is a personal, community-driven project. Your patience and understanding are appreciated ❤️.

Can I Request EXL2 Quants?

Being superior to EXL2 in every way (in terms of quantization quality and flexibility), EXL3 is the main target format for quantization. If you see a good reason for provisioning EXL2 quants - you can make a request with the reasoning why EXL2 should be considered for a particular model.

Keep in mind that among all quantization requests, EXL2 takes the lowest priority.

About EXL3 Quantization

EXL3 is a highly optimized quantization format based on QTIP designed for LLM inference on consumer GPUs. It is an evolution of the EXL2 format, offering higher quality within lower bitrates.

If you enjoy EXL quants, feel free to support EXL3 development and a small cat working tirelessly behind it: turboderp (GitHub, Ko-Fi).

Available Quantization Sizes

To use resources optimally, quants are created in a fixed range of sizes. Custom sizes will only be considered if there is a high community demand and/or available compute.

2.5bpw_H6
3.0bpw_H6
3.5bpw_H6
4.0bpw_H6
4.5bpw_H6 / 4.25bpw_H6 (for 70b and above)
5.0bpw_H6
6.0bpw_H6
8.0bpw_H8

How to Download and Use EXL Quants

Each quantization size for a model is stored in a separate HF repository branch. You can download a specific quant size by its branch.

For example, to download the 4.0bpw_H6 quant:

1. Install hugginface-cli:

pip install -U "huggingface_hub[cli]"

2. Download quant by targeting the specific quant size (revision):

huggingface-cli download ArtusDev/MODEL_NAME --revision "4.0bpw_H6" --local-dir ./

EXL3 quants can be run with any inference client that supports the EXL3 format, such as TabbyAPI. Please refer to documentation for set up instructions.

Other EXL3 Quanters

If you don't find the model quant you're looking for, please check these other excellent community members who also provide EXL3 quants: