ProxyAttn: Guided Sparse Attention via Representative Heads Paper • 2509.24745 • Published Sep 29 • 1
ERNIE 4.5 Collection collection of ERNIE 4.5 models. "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights. • 26 items • Updated Sep 24 • 174
view article Article What is test-time compute and how to scale it? By Kseniase and 1 other • Feb 6 • 107
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31 • 54
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers Paper • 2404.04925 • Published Apr 7, 2024 • 1