Edit Models filters

Apps

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

7,060

Full-text search

Active filters: gptq

Xu-Ouyang/pythia-2.8b-deduped-int3-step129000-GPTQ-wikitext2

Text Generation • 0.5B • Updated Jul 20, 2024

Xu-Ouyang/pythia-12b-deduped-int4-step43000-GPTQ-wikitext2

Text Generation • 2B • Updated Jul 20, 2024

Xu-Ouyang/pythia-12b-deduped-int4-step57000-GPTQ-wikitext2

Text Generation • 2B • Updated Jul 20, 2024

hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

Text Generation • 59B • Updated Aug 7, 2024 • 68 • 15

Xu-Ouyang/pythia-12b-deduped-int4-step86000-GPTQ-wikitext2

Text Generation • 2B • Updated Jul 20, 2024

Xu-Ouyang/pythia-12b-deduped-int4-step100000-GPTQ-wikitext2

Text Generation • 2B • Updated Jul 20, 2024

Nkumah7/gemma-11-2b-it-ptt-lora-exp-v1-merged-4bit-gptq

Text Generation • 0.8B • Updated Jul 20, 2024

rinna/llama-3-youko-8b-gptq

Text Generation • 2B • Updated Mar 23 • 21

rinna/llama-3-youko-8b-instruct-gptq

Text Generation • 2B • Updated Mar 23 • 3 • 1

rinna/llama-3-youko-70b-gptq

Text Generation • 11B • Updated Mar 23 • 6

rinna/llama-3-youko-70b-instruct-gptq

Text Generation • 11B • Updated Mar 23 • 1

Xu-Ouyang/pythia-1.4b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 21, 2024

Xu-Ouyang/pythia-1.4b-deduped-int3-step29000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 21, 2024 • 1

Xu-Ouyang/pythia-1.4b-deduped-int3-step43000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 22, 2024

Xu-Ouyang/pythia-1.4b-deduped-int3-step57000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128-GPTQ

Text Generation • 1B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128-BitBLAS

Text Generation • 4B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-BitBLAS

Text Generation • 4B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-GPTQ

Text Generation • 1B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w4g128-BitBLAS

Text Generation • 7B • Updated Jul 22, 2024

Xu-Ouyang/pythia-2.8b-deduped-int4-step129000-GPTQ-wikitext2

Text Generation • 0.6B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w4g128-GPTQ

Text Generation • 2B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-BitBLAS

Text Generation • 18B • Updated Jul 22, 2024 • 10

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-GPTQ

Text Generation • 5B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w2g64-GPTQ

Text Generation • 6B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w4g128-BitBLAS

Text Generation • 36B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w4g128-GPTQ

Text Generation • 10B • Updated Jul 22, 2024

Xu-Ouyang/pythia-2.8b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 0.5B • Updated Jul 22, 2024

Xu-Ouyang/pythia-12b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 2B • Updated Jul 22, 2024

ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-GPTQ

Text Generation • 0.7B • Updated Jul 22, 2024