GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 4 days ago • 154
RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 Image-Text-to-Text • 20B • Updated Sep 22, 2025 • 147k • 12
Jzuluaga/accent-id-commonaccent_xlsr-en-english Audio Classification • Updated Dec 2, 2025 • 607 • 17