ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper โข 2512.05111 โข Published 24 days ago โข 45