Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published 8 days ago • 109
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 5 days ago • 114
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent Paper • 2601.07779 • Published 7 days ago • 25
MMLongCite: A Benchmark for Evaluating Fidelity of Long-Context Vision-Language Models Paper • 2510.13276 • Published Oct 15, 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark Paper • 2512.01603 • Published Dec 1, 2025
Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback Paper • 2512.22336 • Published 24 days ago • 2
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 10 days ago • 48
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 10 days ago • 48
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 10 days ago • 48
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published Dec 15, 2025 • 63
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper • 2510.06014 • Published Oct 7, 2025 • 10