From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 37
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment Paper • 2410.09893 • Published Oct 13, 2024
Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning Paper • 2505.13886 • Published May 20 • 6
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning Paper • 2405.06680 • Published May 5, 2024 • 1
Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning Paper • 2505.13886 • Published May 20 • 6