VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published Mar 9, 2025 • 11
Dynamic Scaling of Unit Tests for Code Reward Modeling Paper • 2501.01054 • Published Jan 2, 2025 • 16