Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding Paper • 2512.05774 • Published about 1 month ago • 6
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 20
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 98
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6, 2025 • 96
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2, 2025 • 20
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 37