view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels Aug 18 β’ 85
view article Article You could have designed state of the art positional encoding Nov 25, 2024 β’ 387
view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One By rishiraj β’ Jun 26 β’ 48
view article Article How Long Prompts Block Other Requests - Optimizing LLM Performance By tngtech β’ Jun 12 β’ 6
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech β’ Apr 16 β’ 50
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper β’ 2506.01939 β’ Published Jun 2 β’ 185
view article Article Enabling Long Context Training with Sequence Parallelism in Axolotl By axolotl-ai-co and 1 other β’ Apr 4 β’ 14
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais β’ Aug 4, 2024 β’ 30
Scotch & SOTA π₯ Pt. 7: Human Feedback Datasets π«£ Collection The elusive βhumanβ feedback β’ 1 item β’ Updated Sep 13, 2023 β’ 1
Scotch & SOTA π₯ Pt. 6: Dialogue Tuning Datasets π¬ Collection Conversations, turn-based dialog, and things that can be turned into that. β’ 4 items β’ Updated Sep 13, 2023 β’ 1
Scotch & SOTA π₯ Pt. 5: Instruction Tuning Datasets π©βπ« Collection Question & answer, task completion, general SFT and otherwise finetuney data. β’ 7 items β’ Updated Sep 13, 2023 β’ 1
view article Article Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia? By davanstrien β’ May 7, 2024 β’ 8