The Photographer Eye: Teaching Multimodal Large Language Models to See and Critique like Photographers Paper • 2509.18582 • Published Sep 23 • 2
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 55
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3 • 8