Unsloth 4-bit Dynamic Quants Collection Unsloths Dynamic 4bit Quants selectively skips quantizing certain parameters; greatly improving accuracy while only using <10% more VRAM than BnB 4bit • 28 items • Updated 2 days ago • 91
GRMR V3 GGUFs Collection GGUF Quantized versions of the GRMR V3 Models • 6 items • Updated Jun 4 • 7
llama.vim Collection Recommended models for the llama.vim and llama.vscode plugins • 10 items • Updated 10 days ago • 60
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Jul 10 • 211
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 50 items • Updated 15 days ago • 136
LayerSkip Collection Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 • 8 items • Updated Nov 21, 2024 • 48
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 649
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 16 items • Updated 2 days ago • 241
Gemma 2 2B Release Collection The 2.6B parameter version of Gemma 2. • 6 items • Updated Jul 10 • 82
LLM Compiler Collection Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated Jun 27, 2024 • 155
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 874