RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling Paper • 2510.20206 • Published 4 days ago • 8
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published 5 days ago • 11
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning Paper • 2510.20286 • Published 4 days ago • 15
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 3 days ago • 17
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published 3 days ago • 38
LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas Paper • 2510.20820 • Published 3 days ago • 7
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model Paper • 2510.20803 • Published 3 days ago • 7
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets Paper • 2510.19944 • Published 4 days ago • 13
Search Self-play: Pushing the Frontier of Agent Capability without Supervision Paper • 2510.18821 • Published 5 days ago • 14
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published 5 days ago • 64
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published 4 days ago • 17
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives Paper • 2510.20822 • Published 3 days ago • 34
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published 4 days ago • 45
Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Paper • 2510.18279 • Published 6 days ago • 3
Accelerating Vision Transformers with Adaptive Patch Sizes Paper • 2510.18091 • Published 6 days ago • 4