Pre-training Dataset Samples Collection A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. β’ 19 items β’ Updated 2 days ago β’ 16
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4 β’ 28
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16 β’ 7
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16 β’ 7 β’ 2
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16 β’ 7
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. β’ 11 items β’ Updated 3 days ago β’ 81
SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation Paper β’ 2504.14396 β’ Published Apr 19 β’ 27
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub +2 Feb 12 β’ 81
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper β’ 2504.06263 β’ Published Apr 8 β’ 182
Running 3.6k The Ultra-Scale Playbook π 3.6k The ultimate guide to training LLM on large GPU Clusters