Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models Paper • 2510.10964 • Published Oct 13 • 3
Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 3 items • Updated 18 days ago • 37
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 4 days ago • 50
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples Paper • 2510.07192 • Published Oct 8 • 5
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation Paper • 2408.13586 • Published Aug 24, 2024 • 3
Lingshu MLLMs Collection Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning • 4 items • Updated Oct 9 • 21
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated Oct 30 • 77
RpR Models Collection RpR (RolePlay with Reasoning) models which are built on RPMax datasets with properly trained multi-turn reasoning. • 8 items • Updated Jun 25 • 16
GPT-OSS General (4.2B to 20B) Collection Collection of pruned GPT-OSS models spanning 1-32 experts, maintaining general capabilities across domains while reducing computational requirements. • 29 items • Updated Aug 13 • 10
Qwen3-Coder Collection The Qwen3-Coder models deliver SOTA advancements in agentic coding and code tasks. Includes Qwen3-Coder-480B-A35B. • 9 items • Updated 3 days ago • 16
GLM-4.5 Collection GLM-4.5: An open-source large language model designed for intelligent agents by Z.ai • 11 items • Updated Aug 11 • 250
Qwen 2.5 Coder Collection Complete collection of Code-specific model series for Qwen2.5 in bnb 4bit, 16bit and GGUF formats. • 35 items • Updated 3 days ago • 36
NTQ AI LM Collection A collection of finely tuned Language Models (LLMs) across diverse datasets. • 4 items • Updated Feb 14 • 3