--- language: en tags: - exl2 - exl3 - quantization - requests - community ---
Community hub for requesting EXL3 quants.
To request a new model quant, please follow these steps:
Please note that not all requests can be fulfilled. The decision to quantize a model depends on available computing resources, model popularity, technical feasibility, and priority.
This is a personal, community-driven project. Your patience and understanding are appreciated ❤️.
Being superior to EXL2 in every way (in terms of quantization quality and flexibility), EXL3 is the main target format for quantization. If you see a good reason for provisioning EXL2 quants - you can make a request with the reasoning why EXL2 should be considered for a particular model.
Keep in mind that among all quantization requests, EXL2 takes the lowest priority.
EXL3 is a highly optimized quantization format based on QTIP designed for LLM inference on consumer GPUs. It is an evolution of the EXL2 format, offering higher quality within lower bitrates.
If you enjoy EXL quants, feel free to support EXL3 development and a small cat working tirelessly behind it: turboderp (GitHub, Ko-Fi).
To use resources optimally, quants are created in a fixed range of sizes. Custom sizes will only be considered if there is a high community demand and/or available compute.
2.5bpw_H63.0bpw_H63.5bpw_H64.0bpw_H64.5bpw_H6 / 4.25bpw_H6 (for 70b and above)5.0bpw_H66.0bpw_H68.0bpw_H8Each quantization size for a model is stored in a separate HF repository branch. You can download a specific quant size by its branch.
For example, to download the 4.0bpw_H6 quant:
1. Install hugginface-cli:
pip install -U "huggingface_hub[cli]"
2. Download quant by targeting the specific quant size (revision):
huggingface-cli download ArtusDev/MODEL_NAME --revision "4.0bpw_H6" --local-dir ./
EXL3 quants can be run with any inference client that supports the EXL3 format, such as TabbyAPI. Please refer to documentation for set up instructions.
If you don't find the model quant you're looking for, please check these other excellent community members who also provide EXL3 quants: