Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Paper • 2512.13607 • Published 10 days ago • 26
view article Article Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance 16 days ago • 81
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 1 day ago • 48
Tiny-A2D Collection Small diffusion language models adapted from AR models • 4 items • Updated 19 days ago • 11
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 24 days ago • 93
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages Paper • 2505.11475 • Published May 16 • 4
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2 • 56
Apertus LLM Collection Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated Oct 1 • 315
— Long-context post-training 🧶 — Collection Resources for post-training LLMs with long-context samples • 5 items • Updated Sep 14 • 5
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14 • 60
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 151
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published Jul 27 • 14
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 300
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 Mar 22, 2024 • 104