Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23, 2024 • 32
Cohere Labs Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 4 items • Updated Jul 31 • 42
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier Paper • 2412.04261 • Published Dec 5, 2024 • 6
Cohere Labs Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated Jul 31 • 56
Teaching Models to Understand (but not Generate) High-risk Data Paper • 2505.03052 • Published May 5 • 6
view article Article GaLore: Advancing Large Model Training on Consumer-grade Hardware Mar 20, 2024 • 32
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 27
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 40
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 467
view article Article From PyTorch DDP to 🤗 Accelerate to 🤗 Trainer, mastery of distributed training with ease Oct 21, 2022 • 40
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 46
view article Article How to train a new language model from scratch using Transformers and Tokenizers Feb 14, 2020 • 53
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 55